[Haskell-cafe] Re: #haskell works

2007-12-20 Thread Simon Marlow

Tim Chevalier wrote:

On 12/14/07, Dan Piponi [EMAIL PROTECTED] wrote:

There have been some great improvements in array handling recently. I
decided to have a look at the assembly language generated by some
simple array manipulation code and understand why C is at least twice
as fast as ghc 6.8.1. One the one hand it was disappointing to see
that the Haskell register allocator seems a bit inept and was loading
data into registers that should never have been spilled out of
registers in the first place.


Someone who knows the backend better than I do can correct me if I'm
wrong, but it's my understanding that GHC 6.8.1 doesn't even attempt
to do any register allocation on x86. So -- register allocator? What
register allocator?


That's not entirely true - there is a fairly decent linear-scan register 
allocator in GHC


http://darcs.haskell.org/ghc/compiler/nativeGen/RegAllocLinear.hs

the main bottleneck is not the quality of the register allocation (at 
least, not yet).


The first problem is that in order to get good performance when compiling 
via C we've had to lock various global variables into registers (the heap 
pointer, stack pointer etc.), which leaves too few registers free for 
argument passing on x86, so the stack is used too much.  This is probably 
why people often say that the register allocator sucks - in fact it is 
really the calling convention that sucks.  There is some other stupidness 
such as reloading values from the stack, though.


Another problem is that the backend doesn't turn recursion into loops (i.e. 
backward jumps), so those crappy calling conventions are used around every 
loop.  If we fixed that - which is pretty easy, we've tried it - then the 
bottleneck becomes the lack of loop optimisations in the native code 
generator, and we also run into the limitations of the current register 
allocator.  Fortunately the latter has been fixed: Ben Lippmeier has 
written a graph-colouring allocator, and it's available for trying out in 
GHC HEAD.


Fixing it all properly means some fairly significant architectural changes, 
and dropping the via-C backend (except for bootstrapping on new platforms), 
which is what we'll be doing in 6.10.  I'd expect to see some dramatic 
improvements for those tight loops, in ByteString for example, but for 
typical Haskell code and GHC itself I'll be pleased if we get 10%.  We'll 
see.


Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: #haskell works

2007-12-20 Thread Tim Chevalier
On 12/20/07, Simon Marlow [EMAIL PROTECTED] wrote:

 That's not entirely true - there is a fairly decent linear-scan register
 allocator in GHC

 http://darcs.haskell.org/ghc/compiler/nativeGen/RegAllocLinear.hs

 the main bottleneck is not the quality of the register allocation (at
 least, not yet).

 The first problem is that in order to get good performance when compiling
 via C we've had to lock various global variables into registers (the heap
 pointer, stack pointer etc.), which leaves too few registers free for
 argument passing on x86, so the stack is used too much.  This is probably
 why people often say that the register allocator sucks - in fact it is
 really the calling convention that sucks.  There is some other stupidness
 such as reloading values from the stack, though.

[snipped further reasons]

Thanks for enlightening me. (I had been opting to believe the various
rumor and hearsay floating around rather than actually reading the
source :-)

One reason why I care about this is that over the summer I was trying
to do some performance measurements for House. One of the experiments
I did was measuring how long it took to run a loop of Haskell code
that just did a no-op FFI call. This was still ten times slower than a
loop in C that called the same no-op function. I looked at the
generated code (with the native-code backend), noticed the issues you
mentioned above (reloading values from the stack, and so on), and
concluded that there was probably a good reason why the backend was
being worked on actively. The -fvia-C code wasn't much better.

However, this was with GHC 6.2, so obviously this suggests that
porting House to a newer GHC version might be worthwhile for us to do
:-)

Cheers,
Tim

-- 
Tim Chevalier * catamorphism.org * Often in error, never in doubt
Dare to be naive.--R. Buckminster Fuller
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: #haskell works

2007-12-15 Thread Peter Hercek

Andrew Coppin wrote:
(I suppose I could try writing a nop program and timing it. But 
personally I don't have any way of timing things to that degree of 
accuracy. I understand there are command line tools on Unix that will do 
it, but not here.)


You can try for example this one http://www.pc-tools.net/win32/ptime/
 to measure times better on windows. I tried the above prg few years ago
 and it seemed to work. If you do not mind installing cygwin then you
 can get time command from it.

The only problem is that both ptime and cygwin time do not add times
 of child processes to the result. Unix tools do that by default (since
 child accounting info is added to parent process if the child is
 waited for).

If you want to add children time to your result you probably need to
 write your own utility for win32 timing. It should be something like
 100 lines of C code. See QueryInformationJobObject win32 api function
 to start. I know only one commercial tool which can take children
 time into account on windows. If you would write it and decide to
 release it under a free license, let me know :)

Peter.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: #haskell works

2007-12-15 Thread Peter Hercek

Tim Chevalier wrote:

Try the -Rghc-timing flag.


Interesting, that one does not work in my program compiled with
 ghc 6.8.1 (looks like ghc runtime does not consume it but passes
 it to my haskell code). +RTS -tstderr works but its usability is
 limited since it provides only elapsed time and not the process
 cpu times.

Peter.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: #haskell works

2007-12-15 Thread Tim Chevalier
On 12/15/07, Peter Hercek [EMAIL PROTECTED] wrote:
 Tim Chevalier wrote:
  Try the -Rghc-timing flag.

 Interesting, that one does not work in my program compiled with
   ghc 6.8.1 (looks like ghc runtime does not consume it but passes
   it to my haskell code). +RTS -tstderr works but its usability is
   limited since it provides only elapsed time and not the process
   cpu times.


Sorry, my mistake -- it's an RTS option, so:

./program +RTS -Rghc-timing -RTS

and I guess you have to compile with -prof.

Cheers,
Tim

-- 
Tim Chevalier * catamorphism.org * Often in error, never in doubt
Live fast, love hard, and wear corrective lenses if you need them.
--Webb Wilder
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: #haskell works

2007-12-15 Thread Peter Hercek

Tim Chevalier wrote:

On 12/15/07, Peter Hercek [EMAIL PROTECTED] wrote:

Tim Chevalier wrote:

Try the -Rghc-timing flag.

Interesting, that one does not work in my program compiled with
  ghc 6.8.1 (looks like ghc runtime does not consume it but passes
  it to my haskell code). +RTS -tstderr works but its usability is
  limited since it provides only elapsed time and not the process
  cpu times.



Sorry, my mistake -- it's an RTS option, so:

./program +RTS -Rghc-timing -RTS

and I guess you have to compile with -prof.



I guess it is just buggy in 6.8.1.
That option does not seem to work, not even as an RTS option
 and even when I compile with -prof -auto-all.
But the user guide states that the result should be the same
 as with +RTS -tstderr and if so then it is not that
 interesting (since cpu times are missing). Btw, +RTS -tstderr
 works without -prof too, which is nice :)
I liked the idea that ghc generated exe can report its times
 too (I meant also cpu times and not only the elapsed time)
 but external programs work well for this too, so never mind.

Thanks,
Peter.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe