> -----Original Message-----
> From: David Terei [mailto:[email protected]]
> Sent: 27 November 2011 23:32
> To: Geoffrey Mainland
> Cc: [email protected]
> Subject: Re: [commit: ghc] simd: Handle 4-byte aligned stack on Win32 when
> generating SSE instructions. (0f251a0)
> 
> The mangler exists to work around a lack of features in LLVM that we need
> (and until very recently a bug in the way GHC did stack management). You're
> telling me that Win32 only guarantees 4 byte stack alignment but LLVM
> assume 16byte. This seems like a bug somewhere.
> Either in LLVM or the way we are using it. Its fine to patch the mangler to
> handle this but it should be considered a hack and something better
> pursued.

LLVM does not assume 16-byte stack alignment. Instead, on platforms where 
16-byte stack alignment is not guaranteed, it 1) always outputs a function 
prologue that 2) aligns the stack to a 16-byte boundary with an and 
instructions, and it also 3) disables tail calls.

Because LLVM aligns the stack for a function that has SSE register spills, it 
then always generates movaps instructions for the spills.

I have added a hack to LLVM that disables the stack-aligning code so that we 1) 
don't get ebp scribbled over by the prologue and 2) recover the use of tail 
calls. I should just further hack LLVM to generate movups instructions directly 
instead of changing the mangler.

> So my question is more, what better is being pursued? or can be pursued.

I can just patch LLVM. Keep in mind that the simd branch is not going to be 
merged to master any time soon, so please don't worry about removing the 
mangler---if it disappears and I want to merge from master, it's my problem, 
not yours :)

Patching LLVM is still disgusting. We can start a discussion with the LLVM 
folks about the proper way to accomplish what I want. If you can point me in 
the right direction (which may just be "post to llvm-devel") I would appreciate 
it. We may also just decide not to support SSE on Win32. That wouldn't bother 
me much except for the fact that my development machine at work is a Windows 
box.

> On 27 November 2011 15:10, Geoffrey Mainland <[email protected]>
> wrote:
> > LLVM assumes the stack is 16-byte aligned---either because the
> > platform aligns the stack or because the function prologue aligns it.
> > Therefore SSE register spills always generate movaps instructions,
> > necessitating my hack. We can certainly discuss this with the LLVM
> > folks, though I don't see any other way around the issue in the near term.
> Maybe for 3.1...
> >
> > Geoff
> >
> > On 11/27/11 11:03 PM, "David Terei" <[email protected]> wrote:
> >
> >>Is this really the only solution? I don't understand the issue
> >>completely but it sounds like something you could talk to the LLVM
> >>folks about. I've recently fixed half the reason the LLVM Mangler is
> >>needed and hope to fix the other one someday and kill the Mangler so
> >>I'm against adding any more to it.
> >>
> >>On 27 November 2011 14:15, Geoffrey Mainland
> <[email protected]>
> >>wrote:
> >>> Repository : ssh://darcs.haskell.org//srv/darcs/ghc
> >>>
> >>> On branch  : simd
> >>>
> >>>
> >>>http://hackage.haskell.org/trac/ghc/changeset/0f251a04c21ca4a7de5c00
> 1
> >>>d632
> >>>e15a6fdaf948a
> >>>
> >>>>---------------------------------------------------------------
> >>>
> >>> commit 0f251a04c21ca4a7de5c001d632e15a6fdaf948a
> >>> Author: Geoffrey Mainland <[email protected]>
> >>> Date:   Sat Nov 26 12:45:23 2011 +0000
> >>>
> >>>    Handle 4-byte aligned stack on Win32 when generating SSE
> >>>instructions.
> >>>
> >>>    Win32 only guarantees that the stack is 4-byte aligned, so
> >>>rewrite all movaps
> >>>    instructions to movups in the mangler on that platform. We
> >>>already generate
> >>>    movups for explicit loads and stores in the LLVM back-end because
> >>>they are
> >>>    marked as potentially unaligned. However, LLVM generates movaps
> >>>for registers
> >>>    spills, and I don't see any way to fix that except by rewriting
> >>>the assembly
> >>>    output in the mangler.
> >>>
> >>>>---------------------------------------------------------------
> >>>
> >>>  compiler/llvmGen/LlvmMangler.hs |   23 +++++++++++++++++++++++
> >>>  1 files changed, 23 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/compiler/llvmGen/LlvmMangler.hs
> >>>b/compiler/llvmGen/LlvmMangler.hs
> >>> index 981bbf2..e042567 100644
> >>> --- a/compiler/llvmGen/LlvmMangler.hs
> >>> +++ b/compiler/llvmGen/LlvmMangler.hs
> >>> @@ -60,7 +60,11 @@ llvmFixupAsm f1 f2 = do
> >>>     w <- openBinaryFile f2 WriteMode
> >>>     ss <- readSections r w
> >>>     hClose r
> >>> +#if mingw32_TARGET_OS
> >>> +    let fixed = (map fixMovaps . fixTables) ss #else
> >>>     let fixed = fixTables ss
> >>> +#endif
> >>>     mapM_ (writeSection w) fixed
> >>>     hClose w
> >>>     return ()
> >>> @@ -107,6 +111,25 @@ writeSection w (hdr, cts) = do
> >>>     B.hPutStrLn w hdr
> >>>   B.hPutStrLn w cts
> >>>
> >>> +fixMovaps :: Section -> Section
> >>> +fixMovaps (hdr, cts) =
> >>> +    (hdr, loop idxs cts)
> >>> +  where
> >>> +    loop :: [Int] -> B.ByteString -> B.ByteString
> >>> +    loop [] cts = cts
> >>> +
> >>> +    loop (i : is) cts =
> >>> +        loop is (hd `B.append` movups `B.append` B.drop 6 tl)
> >>> +      where
> >>> +        (hd, tl) = B.splitAt i cts
> >>> +
> >>> +    idxs :: [Int]
> >>> +    idxs = B.findSubstrings movaps cts
> >>> +
> >>> +    movaps, movups :: B.ByteString
> >>> +    movaps = B.pack "movaps"
> >>> +    movups = B.pack "movups"
> >>> +
> >>>  -- | Reorder and convert sections so info tables end up next to the
> >>>  -- code. Also does stack fixups.
> >>>  fixTables :: [Section] -> [Section]


_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to