Thank you for doing this work!  I'm going to take some time over the
weekend and integrate it into my own work on the build system.  I'll
fix/replace your autoconf patch to integrate with what I've already
done - I did the work to produce a preprocessor variable CC_FLAVOR
defined as "gcc" or "clang", depending on what we were configured
against, which is a better way to get the information.

A note - one reason my build-system work hasn't seen the light of day
is that git mystifies me.  Is the preferred way of submitting my work
that I should put my copy of the repo someplace publicly visible and
post a link to it on this mailing list, as you have done?  I couldn't
figure out how to get git to generate an email encapsulating the
patch...


On Thu, Jun 30, 2011 at 12:43 PM, David Peixotto <[email protected]> wrote:
> I have made the changes necessary to compile GHC with llvm-gcc. The major 
> change was to use the pthread api for thread level storage to access the gct 
> variable during garbage collection. My measurements indicate this causes an 
> average slowdown of about 5% for gc heavy programs. The changes are available 
> from the `clang` branch on my github fork.
>
>    git://github.com/dmpots/ghc.git clang
>
> The branch contains only two new patches. One patch changes the gc to use 
> pthreads for thread local storage when the `llvm_CC_FLAVOR` symbol is defined 
> by the preprocessor, and the other patch defines the symbol based on an 
> autoconf test. The autoconf patch may be a bit heavy-handed because it is 
> really just checking to see if the `__llvm__` symbol is defined by the C 
> compiler. I based it on an answer from a stack overflow question:  
> http://stackoverflow.com/questions/1617877/how-to-detect-llvm-and-its-version-through-define-directives.
>  I'm open to suggestions on improving either patch.
>
> I've been using the following configure line to test the llvm-gcc support.
>
>    $ CC=/usr/bin/llvm-gcc ./configure --with-gcc=/usr/bin/llvm-gcc
>
> The validate script finds the same errors with or without my patches.
>
> For the performance measurements, I looked at the fibon benchmarks and the 
> nofib gc benchmarks. Both benchmarks were tested on MacOS X 10.6 with a 
> 64-bit GHC.
>
> The fibon benchmarks show an average slowdown of 3% in execution time, but 
> the gc time slows down by an average of 10%. The nofib gc benchmarks show an 
> average execution time slowdown of 5%.
>
> The detailed results are below. In the fibon results, a negative number means 
> that the llvm-gcc version is slower and a positive number means it was 
> faster. The efficiency column is the percent of total execution time spent in 
> the garbage collector.
>
> Fibon Results
> -----------------------------------------------------------------
>                MutCPUTime    GCCPUTime TotalCPUTime   Efficiency
> Agum                +8.52%      -10.95%       +4.14%       78.56%
> BinaryTrees         -0.06%      -16.01%       -6.09%       64.40%
> Blur                -0.19%       -3.03%       -0.22%       99.06%
> Bzlib               -2.65%       -3.08%       -2.66%       99.90%
> Chameneos          -22.01%      -11.22%      -21.95%       99.55%
> Cpsa                +1.82%       -8.68%       +0.88%       91.23%
> Crypto              -1.13%      -15.03%       -8.91%       48.58%
> FFT2d               +3.98%       -5.58%       +3.52%       95.26%
> FFT3d               +0.44%       -3.25%       +0.35%       97.50%
> Fannkuch            -1.92%       -7.41%       -2.27%       93.87%
> Fgl                 +3.18%      -11.23%       -2.71%       60.60%
> Fst                 -0.21%      -19.60%       -3.84%       81.98%
> Funsat              +0.55%      -11.31%       -4.35%       60.39%
> Gf                  +0.29%       -9.78%       -2.80%       70.16%
> HaLeX               +3.59%      -16.01%       +2.86%       96.39%
> Happy               -0.37%      -13.65%       -6.07%       59.52%
> Hgalib              +2.09%       -9.85%       +1.14%       92.11%
> Laplace             +0.04%       -5.34%       -0.17%       96.07%
> MMult              -13.61%       -6.31%      -13.38%       97.27%
> Mandelbrot          +0.12%       -3.76%       +0.11%       99.81%
> Nbody               +0.11%       -4.12%       +0.08%       99.35%
> Palindromes        +15.02%      -15.63%       -4.46%       41.55%
> Pappy               +2.18%      -11.66%       -9.90%       20.66%
> Pidigits            +0.49%      -21.80%       -3.83%       81.35%
> QuickCheck          -2.02%       +2.75%       -1.14%       81.42%
> Regex               +3.39%       -6.62%       +2.92%       95.41%
> Simgi               +5.20%      -16.37%       -0.10%       76.39%
> SpectralNorm        +0.13%         ----       +0.13%      100.00%
> TernaryTrees        +2.79%       -9.93%       -3.66%       51.37%
> Xsact               -0.52%      -14.58%       -6.84%       57.94%
> -----------------------------------------------------------------
> Min                -22.01%      -21.80%      -21.95%       20.66%
> Mean                +0.31%       -9.97%       -2.97%       79.59%
> Max                +15.02%       +2.75%       +4.14%      100.00%
>
>
> In the nofib results a positive number means the llvm-gcc version was slower 
> and a negative number means it was faster (sorry for the inconsistency!)
>
> NoFib Results
> ------------------------------------------------------------------------------
>        Program           Size    Allocs   Runtime   Elapsed  TotalMem
> ------------------------------------------------------------------------------
>        circsim         -75.0%     +0.0%     +5.6%     +5.1%     +0.0%
>    constraints         -75.6%     +0.0%     +6.6%     +6.2%     +0.0%
>       gc_bench         -75.9%     +0.0%     +8.6%     +8.4%     +0.0%
>           lcss         -76.2%     +0.0%     +7.7%     +6.8%     +0.0%
>          power         -74.6%     +0.0%     +5.4%     +4.6%     +0.9%
>     spellcheck         -80.3%     +0.0%     -1.0%     -1.9%     +0.0%
> ------------------------------------------------------------------------------
>            Min         -80.3%     +0.0%     -1.0%     -1.9%     +0.0%
>            Max         -74.6%     +0.0%     +8.6%     +8.4%     +0.9%
>  Geometric Mean         -76.3%     +0.0%     +5.4%     +4.8%     +0.1%
>
> On Jun 27, 2011, at 6:18 PM, David Peixotto wrote:
>
>> I'll take a look at getting the llvm-gcc route going by switching the gct 
>> variable to use pthread_getspecific() on mac os x. I can do some 
>> benchmarking to measure the impact.
>>
>> I was playing around just to get the compilation to succeed. After a small 
>> change in STGCRun.c, the compile went through but then it was getting a 
>> segfault in the stage 2 compiler because of the global register variables.
>>
>> I thought that llvm-gcc would complain about the global register variables, 
>> but it seems to accept them and generate the assembly code to read and write 
>> them. Only problem is it will also use these registers for other purposes, 
>> so the gct was getting stomped which was causing the segfaults.
>>
>> So from what I can see llvm-gcc dies at compile time when given __thread 
>> variables and accepts global register variables but can generate code that 
>> stomps on the register.
>>
>> -David
>>
>> On Jun 24, 2011, at 3:23 AM, Simon Marlow wrote:
>>
>>> On 21/06/2011 05:51, Manuel M T Chakravarty wrote:
>>>> austin seipp:
>>>>> (CC'ing Dan so he can chime in, for those who don't IRC.)
>>>>>
>>>>> Dan Knapp (dankna on freenode) is running OS X Lion on his machine
>>>>> (and corresponding new xcode tools I believe,) and apparently Apple
>>>>> have gone the whole way in the next release and by default making
>>>>> 'gcc' a symbolic link to 'llvm-gcc.'
>>>>
>>>> Just like my prediction ;)
>>>>
>>>>> It's likely that will soon be
>>>>> clang, given llvm-gcc is already deprecated as of LLVM 2.9. There is
>>>>> still a regular GCC bundled with Lion apparently, ISTR Dan saying the
>>>>> executable was under /Developer under the name
>>>>> 'i686-apple-darwin-gcc-4.2' or somesuch, but I can't verify that (Snow
>>>>> Leopard here.) Anyone with lion want to chime in?
>>>>
>>>> I would assume that 'gcc-4.2' will still point to the traditional GCC for 
>>>> a while.  Especially with C++, clang is still behind and there are still 
>>>> the odd code generator bugs in LLVM that require code generation with 
>>>> traditional gcc.
>>>>
>>>>> Dan was working on build fixes/RTS fixes last week to try and make GHC
>>>>> build cleanly with the pthread_getspecific and work with compilers
>>>>> other than GCC. I think he did make some good headway in this area,
>>>>> but his work isn't done either.
>>>>>
>>>>> Considering global register variables are a rather rare and intricate
>>>>> GCC extension, it's much more likely that we will see __thread support
>>>>> in Clang first (TLS also has implications for C++0x I've heard them
>>>>> say.) It's not on their short-term TODO list, however. In the mean
>>>>> time if apple were to remove GCC entirely for some reason, we'd still
>>>>> need Dan's patches, wouldn't we?
>>>>
>>>> If we could move to clang (on OS X) that would be ideal, but as I wrote 
>>>> above I seriously doubt that Apple will entirely remove gcc (at least not 
>>>> before whatever cat comes after Lion).  So, for the time being, and until 
>>>> we can use clang, I think it would be wise to use 'gcc-4.2' as a default 
>>>> on OS X (instead of 'gcc', which appears to morph into llvm-gcc soon).  If 
>>>> we do that for GHC 7.2, then GHC 7.2 won't break once Apple flips the sym 
>>>> link over.
>>>>
>>>> Simon, what do you think?
>>>
>>> I have no strong opinions, you guys know the platform much better then me, 
>>> so I'm happy to go with whatever you think makes the most sense.
>>>
>>> One thing I would keep an eye on is the performance of the GC, because the 
>>> handling of the gct thread-local variable is critical.  I can help you with 
>>> some quick benchmarks if you want to test out changes.
>>>
>>> Cheers,
>>>      Simon
>>>
>>>
>>>
>>>> Manuel
>>>>
>>>>
>>>>> On Sun, Jun 19, 2011 at 9:43 PM, Manuel M T Chakravarty
>>>>> <[email protected]>  wrote:
>>>>>> As llvm-gcc on OS X seems to require some work, I wonder whether we 
>>>>>> should by default build with the 'gcc-4.2' executable on OS X (which 
>>>>>> uses the traditional gcc backend), instead of the generic 'gcc' 
>>>>>> (probably still using 'gcc' as a fallback in configure if 'gcc-4.2' is 
>>>>>> not available).  Then, when Apple makes the switch, binary GHC packages 
>>>>>> will continue to work.
>>>>>>
>>>>>> Manuel
>>>>>>
>>>>>> PS: I am all for resolving the problems with llvm-gcc, but that will 
>>>>>> likely take a while.  It'd be good to get a fix into 7.2, though.
>>>>>>
>>>>>> Simon Marlow:
>>>>>>> On 01/06/2011 13:30, Manuel M T Chakravarty wrote:
>>>>>>>> Simon Marlow:
>>>>>>>>> On 01/06/2011 07:11, Manuel M T Chakravarty wrote:
>>>>>>>>>> Simon Marlow:
>>>>>>>>>>> On 30/05/2011 14:59, Manuel M T Chakravarty wrote:
>>>>>>>>>>>> It is no secret that Apple moves away from the traditional GCC
>>>>>>>>>>>> backend to LLVM.  In fact, Xcode (which bundles all command line
>>>>>>>>>>>> developer tools on the Mac) today comes with two flavours of gcc:
>>>>>>>>>>>> 'gcc' and 'llvm-gcc', which AFAIK only differ in the backend that 
>>>>>>>>>>>> is
>>>>>>>>>>>> being used.  Currently, the default is the traditional GCC backend,
>>>>>>>>>>>> but it takes no precognition to realise that this will eventually
>>>>>>>>>>>> change.  The 'gcc' executable will use the LLVM backend and, at 
>>>>>>>>>>>> least
>>>>>>>>>>>> for a while, the traditional backend will still be available under 
>>>>>>>>>>>> a
>>>>>>>>>>>> different name.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, GHC will break at this point as the LLVM backend 
>>>>>>>>>>>> does
>>>>>>>>>>>> not support pinned global registers.  ('llvm-gcc' happily accepts 
>>>>>>>>>>>> the
>>>>>>>>>>>> register assignment, but fails with a runtime error during code
>>>>>>>>>>>> generation.)
>>>>>>>>>>>
>>>>>>>>>>> This shouldn't be a problem.  We don't use pinned global registers 
>>>>>>>>>>> any more, except in one place - the GC (see rts/sm/GCTDecl.h).  
>>>>>>>>>>> There it's optional, but you lose a bit of performance by not using 
>>>>>>>>>>> a pinned register.  It's not a huge deal.
>>>>>>>>>>>
>>>>>>>>>>> Have you tried building GHC with llvm-gcc?  I think I tried it on 
>>>>>>>>>>> the RTS a year or so ago to check the LLVM output against gcc (LLVM 
>>>>>>>>>>> wasn't quite as good at the time).
>>>>>>>>>>
>>>>>>>>>> Yes, I tried and it failed, while compiling the RTS, with
>>>>>>>>>>
>>>>>>>>>>       sorry, unimplemented: LLVM cannot handle register variable 
>>>>>>>>>> ‘R1’, report a bug
>>>>>>>>>>
>>>>>>>>>> This was using the 64bit version of GHC.  I'll have a closer look.
>>>>>>>>>
>>>>>>>>> Perhaps that was when compiling StgCRun.c? It doesn't actually need 
>>>>>>>>> register variables (on x86_64 at least), but it does include the 
>>>>>>>>> header files, so that probably needs some #ifdefery somewhere for 
>>>>>>>>> llvm-gcc.
>>>>>>>>
>>>>>>>> Yes, it's in 'StgCRun.c'.   Ok, and how about on i386 (or do you want
>>>>>>>> to phase that arch out)?
>>>>>>>
>>>>>>> It doesn't look like the x86 code in StgCRun.c uses registers either. 
>>>>>>> The sparc version does, but it could be rewritten.
>>>>>>>
>>>>>>>>> The other place, as I mentioned above, is rts/sm/GCTDecl.h, which 
>>>>>>>>> will need to use a different method for declaring the garbage 
>>>>>>>>> collector's thread-local state variable, gct.  On x86_64 I found that 
>>>>>>>>> using a fixed register was the fastest, but using a thread-local 
>>>>>>>>> variable (the __thread modifier) also works.
>>>>>>>>
>>>>>>>> Just to make sure I understand correctly, are you saying that using a
>>>>>>>> thread-local variable is already implemented as an option,
>>>>>>>
>>>>>>> Yes - look at the series of #ifdefs in that file, it's pretty 
>>>>>>> straightforward to change how gct is declared for a particular platform.
>>>>>>>
>>>>>>> However, I've just done some poking around and it seems that __thread 
>>>>>>> is not supported on OS X:
>>>>>>>
>>>>>>> http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html
>>>>>>>
>>>>>>> see also this thread about Clang:
>>>>>>>
>>>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2011-March/013673.html
>>>>>>>
>>>>>>> It seems there might be support for __thread in the future, but not in 
>>>>>>> the short term.
>>>>>>>
>>>>>>> It seems our very own David Peixotto tried building GHC with Clang a 
>>>>>>> year ago and ran into the same thing:
>>>>>>>
>>>>>>> http://www.dmpots.com/blog/2010/05/08/building-ghc-with-clang.html
>>>>>>>
>>>>>>> So this is less than ideal.  The short term fix would be to #define gct 
>>>>>>> to be a called to pthread_getspecific().  The call will be inlined - 
>>>>>>> the OS X headers define pthread_getspecific in terms of some inline 
>>>>>>> assembly, but the optimiser won't know anything about the inline 
>>>>>>> assembly so it won't be able to common up multiple loads of gct, and 
>>>>>>> that probably means it won't perform well.  If that's the case, then 
>>>>>>> the solution is to load up gct into a temporary in the 
>>>>>>> performance-critical functions in the GC (evacuate(), 
>>>>>>> scavenge_block()), and add it as an argument to inline functions.  I'd 
>>>>>>> rather avoid having to do all that if possible.
>>>>>>>
>>>>>>> If you want to benchmark the GC, there are some good programs in 
>>>>>>> nofib/gc.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>      Simon
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Cvs-ghc mailing list
>>>>>> [email protected]
>>>>>> http://www.haskell.org/mailman/listinfo/cvs-ghc
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Austin
>>>>
>>>
>>>
>>> _______________________________________________
>>> Cvs-ghc mailing list
>>> [email protected]
>>> http://www.haskell.org/mailman/listinfo/cvs-ghc
>>>
>>
>>
>> _______________________________________________
>> Cvs-ghc mailing list
>> [email protected]
>> http://www.haskell.org/mailman/listinfo/cvs-ghc
>>
>
>
> _______________________________________________
> Cvs-ghc mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/cvs-ghc
>



-- 
Dan Knapp
"An infallible method of conciliating a tiger is to allow oneself to
be devoured." (Konrad Adenauer)

_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to