Thank you for doing this work! I'm going to take some time over the weekend and integrate it into my own work on the build system. I'll fix/replace your autoconf patch to integrate with what I've already done - I did the work to produce a preprocessor variable CC_FLAVOR defined as "gcc" or "clang", depending on what we were configured against, which is a better way to get the information.
A note - one reason my build-system work hasn't seen the light of day is that git mystifies me. Is the preferred way of submitting my work that I should put my copy of the repo someplace publicly visible and post a link to it on this mailing list, as you have done? I couldn't figure out how to get git to generate an email encapsulating the patch... On Thu, Jun 30, 2011 at 12:43 PM, David Peixotto <[email protected]> wrote: > I have made the changes necessary to compile GHC with llvm-gcc. The major > change was to use the pthread api for thread level storage to access the gct > variable during garbage collection. My measurements indicate this causes an > average slowdown of about 5% for gc heavy programs. The changes are available > from the `clang` branch on my github fork. > > git://github.com/dmpots/ghc.git clang > > The branch contains only two new patches. One patch changes the gc to use > pthreads for thread local storage when the `llvm_CC_FLAVOR` symbol is defined > by the preprocessor, and the other patch defines the symbol based on an > autoconf test. The autoconf patch may be a bit heavy-handed because it is > really just checking to see if the `__llvm__` symbol is defined by the C > compiler. I based it on an answer from a stack overflow question: > http://stackoverflow.com/questions/1617877/how-to-detect-llvm-and-its-version-through-define-directives. > I'm open to suggestions on improving either patch. > > I've been using the following configure line to test the llvm-gcc support. > > $ CC=/usr/bin/llvm-gcc ./configure --with-gcc=/usr/bin/llvm-gcc > > The validate script finds the same errors with or without my patches. > > For the performance measurements, I looked at the fibon benchmarks and the > nofib gc benchmarks. Both benchmarks were tested on MacOS X 10.6 with a > 64-bit GHC. > > The fibon benchmarks show an average slowdown of 3% in execution time, but > the gc time slows down by an average of 10%. The nofib gc benchmarks show an > average execution time slowdown of 5%. > > The detailed results are below. In the fibon results, a negative number means > that the llvm-gcc version is slower and a positive number means it was > faster. The efficiency column is the percent of total execution time spent in > the garbage collector. > > Fibon Results > ----------------------------------------------------------------- > MutCPUTime GCCPUTime TotalCPUTime Efficiency > Agum +8.52% -10.95% +4.14% 78.56% > BinaryTrees -0.06% -16.01% -6.09% 64.40% > Blur -0.19% -3.03% -0.22% 99.06% > Bzlib -2.65% -3.08% -2.66% 99.90% > Chameneos -22.01% -11.22% -21.95% 99.55% > Cpsa +1.82% -8.68% +0.88% 91.23% > Crypto -1.13% -15.03% -8.91% 48.58% > FFT2d +3.98% -5.58% +3.52% 95.26% > FFT3d +0.44% -3.25% +0.35% 97.50% > Fannkuch -1.92% -7.41% -2.27% 93.87% > Fgl +3.18% -11.23% -2.71% 60.60% > Fst -0.21% -19.60% -3.84% 81.98% > Funsat +0.55% -11.31% -4.35% 60.39% > Gf +0.29% -9.78% -2.80% 70.16% > HaLeX +3.59% -16.01% +2.86% 96.39% > Happy -0.37% -13.65% -6.07% 59.52% > Hgalib +2.09% -9.85% +1.14% 92.11% > Laplace +0.04% -5.34% -0.17% 96.07% > MMult -13.61% -6.31% -13.38% 97.27% > Mandelbrot +0.12% -3.76% +0.11% 99.81% > Nbody +0.11% -4.12% +0.08% 99.35% > Palindromes +15.02% -15.63% -4.46% 41.55% > Pappy +2.18% -11.66% -9.90% 20.66% > Pidigits +0.49% -21.80% -3.83% 81.35% > QuickCheck -2.02% +2.75% -1.14% 81.42% > Regex +3.39% -6.62% +2.92% 95.41% > Simgi +5.20% -16.37% -0.10% 76.39% > SpectralNorm +0.13% ---- +0.13% 100.00% > TernaryTrees +2.79% -9.93% -3.66% 51.37% > Xsact -0.52% -14.58% -6.84% 57.94% > ----------------------------------------------------------------- > Min -22.01% -21.80% -21.95% 20.66% > Mean +0.31% -9.97% -2.97% 79.59% > Max +15.02% +2.75% +4.14% 100.00% > > > In the nofib results a positive number means the llvm-gcc version was slower > and a negative number means it was faster (sorry for the inconsistency!) > > NoFib Results > ------------------------------------------------------------------------------ > Program Size Allocs Runtime Elapsed TotalMem > ------------------------------------------------------------------------------ > circsim -75.0% +0.0% +5.6% +5.1% +0.0% > constraints -75.6% +0.0% +6.6% +6.2% +0.0% > gc_bench -75.9% +0.0% +8.6% +8.4% +0.0% > lcss -76.2% +0.0% +7.7% +6.8% +0.0% > power -74.6% +0.0% +5.4% +4.6% +0.9% > spellcheck -80.3% +0.0% -1.0% -1.9% +0.0% > ------------------------------------------------------------------------------ > Min -80.3% +0.0% -1.0% -1.9% +0.0% > Max -74.6% +0.0% +8.6% +8.4% +0.9% > Geometric Mean -76.3% +0.0% +5.4% +4.8% +0.1% > > On Jun 27, 2011, at 6:18 PM, David Peixotto wrote: > >> I'll take a look at getting the llvm-gcc route going by switching the gct >> variable to use pthread_getspecific() on mac os x. I can do some >> benchmarking to measure the impact. >> >> I was playing around just to get the compilation to succeed. After a small >> change in STGCRun.c, the compile went through but then it was getting a >> segfault in the stage 2 compiler because of the global register variables. >> >> I thought that llvm-gcc would complain about the global register variables, >> but it seems to accept them and generate the assembly code to read and write >> them. Only problem is it will also use these registers for other purposes, >> so the gct was getting stomped which was causing the segfaults. >> >> So from what I can see llvm-gcc dies at compile time when given __thread >> variables and accepts global register variables but can generate code that >> stomps on the register. >> >> -David >> >> On Jun 24, 2011, at 3:23 AM, Simon Marlow wrote: >> >>> On 21/06/2011 05:51, Manuel M T Chakravarty wrote: >>>> austin seipp: >>>>> (CC'ing Dan so he can chime in, for those who don't IRC.) >>>>> >>>>> Dan Knapp (dankna on freenode) is running OS X Lion on his machine >>>>> (and corresponding new xcode tools I believe,) and apparently Apple >>>>> have gone the whole way in the next release and by default making >>>>> 'gcc' a symbolic link to 'llvm-gcc.' >>>> >>>> Just like my prediction ;) >>>> >>>>> It's likely that will soon be >>>>> clang, given llvm-gcc is already deprecated as of LLVM 2.9. There is >>>>> still a regular GCC bundled with Lion apparently, ISTR Dan saying the >>>>> executable was under /Developer under the name >>>>> 'i686-apple-darwin-gcc-4.2' or somesuch, but I can't verify that (Snow >>>>> Leopard here.) Anyone with lion want to chime in? >>>> >>>> I would assume that 'gcc-4.2' will still point to the traditional GCC for >>>> a while. Especially with C++, clang is still behind and there are still >>>> the odd code generator bugs in LLVM that require code generation with >>>> traditional gcc. >>>> >>>>> Dan was working on build fixes/RTS fixes last week to try and make GHC >>>>> build cleanly with the pthread_getspecific and work with compilers >>>>> other than GCC. I think he did make some good headway in this area, >>>>> but his work isn't done either. >>>>> >>>>> Considering global register variables are a rather rare and intricate >>>>> GCC extension, it's much more likely that we will see __thread support >>>>> in Clang first (TLS also has implications for C++0x I've heard them >>>>> say.) It's not on their short-term TODO list, however. In the mean >>>>> time if apple were to remove GCC entirely for some reason, we'd still >>>>> need Dan's patches, wouldn't we? >>>> >>>> If we could move to clang (on OS X) that would be ideal, but as I wrote >>>> above I seriously doubt that Apple will entirely remove gcc (at least not >>>> before whatever cat comes after Lion). So, for the time being, and until >>>> we can use clang, I think it would be wise to use 'gcc-4.2' as a default >>>> on OS X (instead of 'gcc', which appears to morph into llvm-gcc soon). If >>>> we do that for GHC 7.2, then GHC 7.2 won't break once Apple flips the sym >>>> link over. >>>> >>>> Simon, what do you think? >>> >>> I have no strong opinions, you guys know the platform much better then me, >>> so I'm happy to go with whatever you think makes the most sense. >>> >>> One thing I would keep an eye on is the performance of the GC, because the >>> handling of the gct thread-local variable is critical. I can help you with >>> some quick benchmarks if you want to test out changes. >>> >>> Cheers, >>> Simon >>> >>> >>> >>>> Manuel >>>> >>>> >>>>> On Sun, Jun 19, 2011 at 9:43 PM, Manuel M T Chakravarty >>>>> <[email protected]> wrote: >>>>>> As llvm-gcc on OS X seems to require some work, I wonder whether we >>>>>> should by default build with the 'gcc-4.2' executable on OS X (which >>>>>> uses the traditional gcc backend), instead of the generic 'gcc' >>>>>> (probably still using 'gcc' as a fallback in configure if 'gcc-4.2' is >>>>>> not available). Then, when Apple makes the switch, binary GHC packages >>>>>> will continue to work. >>>>>> >>>>>> Manuel >>>>>> >>>>>> PS: I am all for resolving the problems with llvm-gcc, but that will >>>>>> likely take a while. It'd be good to get a fix into 7.2, though. >>>>>> >>>>>> Simon Marlow: >>>>>>> On 01/06/2011 13:30, Manuel M T Chakravarty wrote: >>>>>>>> Simon Marlow: >>>>>>>>> On 01/06/2011 07:11, Manuel M T Chakravarty wrote: >>>>>>>>>> Simon Marlow: >>>>>>>>>>> On 30/05/2011 14:59, Manuel M T Chakravarty wrote: >>>>>>>>>>>> It is no secret that Apple moves away from the traditional GCC >>>>>>>>>>>> backend to LLVM. In fact, Xcode (which bundles all command line >>>>>>>>>>>> developer tools on the Mac) today comes with two flavours of gcc: >>>>>>>>>>>> 'gcc' and 'llvm-gcc', which AFAIK only differ in the backend that >>>>>>>>>>>> is >>>>>>>>>>>> being used. Currently, the default is the traditional GCC backend, >>>>>>>>>>>> but it takes no precognition to realise that this will eventually >>>>>>>>>>>> change. The 'gcc' executable will use the LLVM backend and, at >>>>>>>>>>>> least >>>>>>>>>>>> for a while, the traditional backend will still be available under >>>>>>>>>>>> a >>>>>>>>>>>> different name. >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately, GHC will break at this point as the LLVM backend >>>>>>>>>>>> does >>>>>>>>>>>> not support pinned global registers. ('llvm-gcc' happily accepts >>>>>>>>>>>> the >>>>>>>>>>>> register assignment, but fails with a runtime error during code >>>>>>>>>>>> generation.) >>>>>>>>>>> >>>>>>>>>>> This shouldn't be a problem. We don't use pinned global registers >>>>>>>>>>> any more, except in one place - the GC (see rts/sm/GCTDecl.h). >>>>>>>>>>> There it's optional, but you lose a bit of performance by not using >>>>>>>>>>> a pinned register. It's not a huge deal. >>>>>>>>>>> >>>>>>>>>>> Have you tried building GHC with llvm-gcc? I think I tried it on >>>>>>>>>>> the RTS a year or so ago to check the LLVM output against gcc (LLVM >>>>>>>>>>> wasn't quite as good at the time). >>>>>>>>>> >>>>>>>>>> Yes, I tried and it failed, while compiling the RTS, with >>>>>>>>>> >>>>>>>>>> sorry, unimplemented: LLVM cannot handle register variable >>>>>>>>>> ‘R1’, report a bug >>>>>>>>>> >>>>>>>>>> This was using the 64bit version of GHC. I'll have a closer look. >>>>>>>>> >>>>>>>>> Perhaps that was when compiling StgCRun.c? It doesn't actually need >>>>>>>>> register variables (on x86_64 at least), but it does include the >>>>>>>>> header files, so that probably needs some #ifdefery somewhere for >>>>>>>>> llvm-gcc. >>>>>>>> >>>>>>>> Yes, it's in 'StgCRun.c'. Ok, and how about on i386 (or do you want >>>>>>>> to phase that arch out)? >>>>>>> >>>>>>> It doesn't look like the x86 code in StgCRun.c uses registers either. >>>>>>> The sparc version does, but it could be rewritten. >>>>>>> >>>>>>>>> The other place, as I mentioned above, is rts/sm/GCTDecl.h, which >>>>>>>>> will need to use a different method for declaring the garbage >>>>>>>>> collector's thread-local state variable, gct. On x86_64 I found that >>>>>>>>> using a fixed register was the fastest, but using a thread-local >>>>>>>>> variable (the __thread modifier) also works. >>>>>>>> >>>>>>>> Just to make sure I understand correctly, are you saying that using a >>>>>>>> thread-local variable is already implemented as an option, >>>>>>> >>>>>>> Yes - look at the series of #ifdefs in that file, it's pretty >>>>>>> straightforward to change how gct is declared for a particular platform. >>>>>>> >>>>>>> However, I've just done some poking around and it seems that __thread >>>>>>> is not supported on OS X: >>>>>>> >>>>>>> http://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html >>>>>>> >>>>>>> see also this thread about Clang: >>>>>>> >>>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2011-March/013673.html >>>>>>> >>>>>>> It seems there might be support for __thread in the future, but not in >>>>>>> the short term. >>>>>>> >>>>>>> It seems our very own David Peixotto tried building GHC with Clang a >>>>>>> year ago and ran into the same thing: >>>>>>> >>>>>>> http://www.dmpots.com/blog/2010/05/08/building-ghc-with-clang.html >>>>>>> >>>>>>> So this is less than ideal. The short term fix would be to #define gct >>>>>>> to be a called to pthread_getspecific(). The call will be inlined - >>>>>>> the OS X headers define pthread_getspecific in terms of some inline >>>>>>> assembly, but the optimiser won't know anything about the inline >>>>>>> assembly so it won't be able to common up multiple loads of gct, and >>>>>>> that probably means it won't perform well. If that's the case, then >>>>>>> the solution is to load up gct into a temporary in the >>>>>>> performance-critical functions in the GC (evacuate(), >>>>>>> scavenge_block()), and add it as an argument to inline functions. I'd >>>>>>> rather avoid having to do all that if possible. >>>>>>> >>>>>>> If you want to benchmark the GC, there are some good programs in >>>>>>> nofib/gc. >>>>>>> >>>>>>> Cheers, >>>>>>> Simon >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Cvs-ghc mailing list >>>>>> [email protected] >>>>>> http://www.haskell.org/mailman/listinfo/cvs-ghc >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Austin >>>> >>> >>> >>> _______________________________________________ >>> Cvs-ghc mailing list >>> [email protected] >>> http://www.haskell.org/mailman/listinfo/cvs-ghc >>> >> >> >> _______________________________________________ >> Cvs-ghc mailing list >> [email protected] >> http://www.haskell.org/mailman/listinfo/cvs-ghc >> > > > _______________________________________________ > Cvs-ghc mailing list > [email protected] > http://www.haskell.org/mailman/listinfo/cvs-ghc > -- Dan Knapp "An infallible method of conciliating a tiger is to allow oneself to be devoured." (Konrad Adenauer) _______________________________________________ Cvs-ghc mailing list [email protected] http://www.haskell.org/mailman/listinfo/cvs-ghc
