> -----Original Message----- > From: David Terei [mailto:[email protected]] > Sent: 27 November 2011 23:32 > To: Geoffrey Mainland > Cc: [email protected] > Subject: Re: [commit: ghc] simd: Handle 4-byte aligned stack on Win32 when > generating SSE instructions. (0f251a0) > > The mangler exists to work around a lack of features in LLVM that we need > (and until very recently a bug in the way GHC did stack management). You're > telling me that Win32 only guarantees 4 byte stack alignment but LLVM > assume 16byte. This seems like a bug somewhere. > Either in LLVM or the way we are using it. Its fine to patch the mangler to > handle this but it should be considered a hack and something better > pursued.
LLVM does not assume 16-byte stack alignment. Instead, on platforms where 16-byte stack alignment is not guaranteed, it 1) always outputs a function prologue that 2) aligns the stack to a 16-byte boundary with an and instructions, and it also 3) disables tail calls. Because LLVM aligns the stack for a function that has SSE register spills, it then always generates movaps instructions for the spills. I have added a hack to LLVM that disables the stack-aligning code so that we 1) don't get ebp scribbled over by the prologue and 2) recover the use of tail calls. I should just further hack LLVM to generate movups instructions directly instead of changing the mangler. > So my question is more, what better is being pursued? or can be pursued. I can just patch LLVM. Keep in mind that the simd branch is not going to be merged to master any time soon, so please don't worry about removing the mangler---if it disappears and I want to merge from master, it's my problem, not yours :) Patching LLVM is still disgusting. We can start a discussion with the LLVM folks about the proper way to accomplish what I want. If you can point me in the right direction (which may just be "post to llvm-devel") I would appreciate it. We may also just decide not to support SSE on Win32. That wouldn't bother me much except for the fact that my development machine at work is a Windows box. > On 27 November 2011 15:10, Geoffrey Mainland <[email protected]> > wrote: > > LLVM assumes the stack is 16-byte aligned---either because the > > platform aligns the stack or because the function prologue aligns it. > > Therefore SSE register spills always generate movaps instructions, > > necessitating my hack. We can certainly discuss this with the LLVM > > folks, though I don't see any other way around the issue in the near term. > Maybe for 3.1... > > > > Geoff > > > > On 11/27/11 11:03 PM, "David Terei" <[email protected]> wrote: > > > >>Is this really the only solution? I don't understand the issue > >>completely but it sounds like something you could talk to the LLVM > >>folks about. I've recently fixed half the reason the LLVM Mangler is > >>needed and hope to fix the other one someday and kill the Mangler so > >>I'm against adding any more to it. > >> > >>On 27 November 2011 14:15, Geoffrey Mainland > <[email protected]> > >>wrote: > >>> Repository : ssh://darcs.haskell.org//srv/darcs/ghc > >>> > >>> On branch : simd > >>> > >>> > >>>http://hackage.haskell.org/trac/ghc/changeset/0f251a04c21ca4a7de5c00 > 1 > >>>d632 > >>>e15a6fdaf948a > >>> > >>>>--------------------------------------------------------------- > >>> > >>> commit 0f251a04c21ca4a7de5c001d632e15a6fdaf948a > >>> Author: Geoffrey Mainland <[email protected]> > >>> Date: Sat Nov 26 12:45:23 2011 +0000 > >>> > >>> Handle 4-byte aligned stack on Win32 when generating SSE > >>>instructions. > >>> > >>> Win32 only guarantees that the stack is 4-byte aligned, so > >>>rewrite all movaps > >>> instructions to movups in the mangler on that platform. We > >>>already generate > >>> movups for explicit loads and stores in the LLVM back-end because > >>>they are > >>> marked as potentially unaligned. However, LLVM generates movaps > >>>for registers > >>> spills, and I don't see any way to fix that except by rewriting > >>>the assembly > >>> output in the mangler. > >>> > >>>>--------------------------------------------------------------- > >>> > >>> compiler/llvmGen/LlvmMangler.hs | 23 +++++++++++++++++++++++ > >>> 1 files changed, 23 insertions(+), 0 deletions(-) > >>> > >>> diff --git a/compiler/llvmGen/LlvmMangler.hs > >>>b/compiler/llvmGen/LlvmMangler.hs > >>> index 981bbf2..e042567 100644 > >>> --- a/compiler/llvmGen/LlvmMangler.hs > >>> +++ b/compiler/llvmGen/LlvmMangler.hs > >>> @@ -60,7 +60,11 @@ llvmFixupAsm f1 f2 = do > >>> w <- openBinaryFile f2 WriteMode > >>> ss <- readSections r w > >>> hClose r > >>> +#if mingw32_TARGET_OS > >>> + let fixed = (map fixMovaps . fixTables) ss #else > >>> let fixed = fixTables ss > >>> +#endif > >>> mapM_ (writeSection w) fixed > >>> hClose w > >>> return () > >>> @@ -107,6 +111,25 @@ writeSection w (hdr, cts) = do > >>> B.hPutStrLn w hdr > >>> B.hPutStrLn w cts > >>> > >>> +fixMovaps :: Section -> Section > >>> +fixMovaps (hdr, cts) = > >>> + (hdr, loop idxs cts) > >>> + where > >>> + loop :: [Int] -> B.ByteString -> B.ByteString > >>> + loop [] cts = cts > >>> + > >>> + loop (i : is) cts = > >>> + loop is (hd `B.append` movups `B.append` B.drop 6 tl) > >>> + where > >>> + (hd, tl) = B.splitAt i cts > >>> + > >>> + idxs :: [Int] > >>> + idxs = B.findSubstrings movaps cts > >>> + > >>> + movaps, movups :: B.ByteString > >>> + movaps = B.pack "movaps" > >>> + movups = B.pack "movups" > >>> + > >>> -- | Reorder and convert sections so info tables end up next to the > >>> -- code. Also does stack fixups. > >>> fixTables :: [Section] -> [Section] _______________________________________________ Cvs-ghc mailing list [email protected] http://www.haskell.org/mailman/listinfo/cvs-ghc
