[Haskell-cafe] Re: #haskell works
Tim Chevalier wrote: On 12/14/07, Dan Piponi [EMAIL PROTECTED] wrote: There have been some great improvements in array handling recently. I decided to have a look at the assembly language generated by some simple array manipulation code and understand why C is at least twice as fast as ghc 6.8.1. One the one hand it was disappointing to see that the Haskell register allocator seems a bit inept and was loading data into registers that should never have been spilled out of registers in the first place. Someone who knows the backend better than I do can correct me if I'm wrong, but it's my understanding that GHC 6.8.1 doesn't even attempt to do any register allocation on x86. So -- register allocator? What register allocator? That's not entirely true - there is a fairly decent linear-scan register allocator in GHC http://darcs.haskell.org/ghc/compiler/nativeGen/RegAllocLinear.hs the main bottleneck is not the quality of the register allocation (at least, not yet). The first problem is that in order to get good performance when compiling via C we've had to lock various global variables into registers (the heap pointer, stack pointer etc.), which leaves too few registers free for argument passing on x86, so the stack is used too much. This is probably why people often say that the register allocator sucks - in fact it is really the calling convention that sucks. There is some other stupidness such as reloading values from the stack, though. Another problem is that the backend doesn't turn recursion into loops (i.e. backward jumps), so those crappy calling conventions are used around every loop. If we fixed that - which is pretty easy, we've tried it - then the bottleneck becomes the lack of loop optimisations in the native code generator, and we also run into the limitations of the current register allocator. Fortunately the latter has been fixed: Ben Lippmeier has written a graph-colouring allocator, and it's available for trying out in GHC HEAD. Fixing it all properly means some fairly significant architectural changes, and dropping the via-C backend (except for bootstrapping on new platforms), which is what we'll be doing in 6.10. I'd expect to see some dramatic improvements for those tight loops, in ByteString for example, but for typical Haskell code and GHC itself I'll be pleased if we get 10%. We'll see. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: #haskell works
On 12/20/07, Simon Marlow [EMAIL PROTECTED] wrote: That's not entirely true - there is a fairly decent linear-scan register allocator in GHC http://darcs.haskell.org/ghc/compiler/nativeGen/RegAllocLinear.hs the main bottleneck is not the quality of the register allocation (at least, not yet). The first problem is that in order to get good performance when compiling via C we've had to lock various global variables into registers (the heap pointer, stack pointer etc.), which leaves too few registers free for argument passing on x86, so the stack is used too much. This is probably why people often say that the register allocator sucks - in fact it is really the calling convention that sucks. There is some other stupidness such as reloading values from the stack, though. [snipped further reasons] Thanks for enlightening me. (I had been opting to believe the various rumor and hearsay floating around rather than actually reading the source :-) One reason why I care about this is that over the summer I was trying to do some performance measurements for House. One of the experiments I did was measuring how long it took to run a loop of Haskell code that just did a no-op FFI call. This was still ten times slower than a loop in C that called the same no-op function. I looked at the generated code (with the native-code backend), noticed the issues you mentioned above (reloading values from the stack, and so on), and concluded that there was probably a good reason why the backend was being worked on actively. The -fvia-C code wasn't much better. However, this was with GHC 6.2, so obviously this suggests that porting House to a newer GHC version might be worthwhile for us to do :-) Cheers, Tim -- Tim Chevalier * catamorphism.org * Often in error, never in doubt Dare to be naive.--R. Buckminster Fuller ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: #haskell works
Andrew Coppin wrote: (I suppose I could try writing a nop program and timing it. But personally I don't have any way of timing things to that degree of accuracy. I understand there are command line tools on Unix that will do it, but not here.) You can try for example this one http://www.pc-tools.net/win32/ptime/ to measure times better on windows. I tried the above prg few years ago and it seemed to work. If you do not mind installing cygwin then you can get time command from it. The only problem is that both ptime and cygwin time do not add times of child processes to the result. Unix tools do that by default (since child accounting info is added to parent process if the child is waited for). If you want to add children time to your result you probably need to write your own utility for win32 timing. It should be something like 100 lines of C code. See QueryInformationJobObject win32 api function to start. I know only one commercial tool which can take children time into account on windows. If you would write it and decide to release it under a free license, let me know :) Peter. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: #haskell works
Tim Chevalier wrote: Try the -Rghc-timing flag. Interesting, that one does not work in my program compiled with ghc 6.8.1 (looks like ghc runtime does not consume it but passes it to my haskell code). +RTS -tstderr works but its usability is limited since it provides only elapsed time and not the process cpu times. Peter. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: #haskell works
On 12/15/07, Peter Hercek [EMAIL PROTECTED] wrote: Tim Chevalier wrote: Try the -Rghc-timing flag. Interesting, that one does not work in my program compiled with ghc 6.8.1 (looks like ghc runtime does not consume it but passes it to my haskell code). +RTS -tstderr works but its usability is limited since it provides only elapsed time and not the process cpu times. Sorry, my mistake -- it's an RTS option, so: ./program +RTS -Rghc-timing -RTS and I guess you have to compile with -prof. Cheers, Tim -- Tim Chevalier * catamorphism.org * Often in error, never in doubt Live fast, love hard, and wear corrective lenses if you need them. --Webb Wilder ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: #haskell works
Tim Chevalier wrote: On 12/15/07, Peter Hercek [EMAIL PROTECTED] wrote: Tim Chevalier wrote: Try the -Rghc-timing flag. Interesting, that one does not work in my program compiled with ghc 6.8.1 (looks like ghc runtime does not consume it but passes it to my haskell code). +RTS -tstderr works but its usability is limited since it provides only elapsed time and not the process cpu times. Sorry, my mistake -- it's an RTS option, so: ./program +RTS -Rghc-timing -RTS and I guess you have to compile with -prof. I guess it is just buggy in 6.8.1. That option does not seem to work, not even as an RTS option and even when I compile with -prof -auto-all. But the user guide states that the result should be the same as with +RTS -tstderr and if so then it is not that interesting (since cpu times are missing). Btw, +RTS -tstderr works without -prof too, which is nice :) I liked the idea that ghc generated exe can report its times too (I meant also cpu times and not only the elapsed time) but external programs work well for this too, so never mind. Thanks, Peter. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe