[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

Simon Marlow Fri, 22 Jun 2007 05:17:06 -0700

Philip Armstrong wrote:

On Thu, Jun 21, 2007 at 08:42:57PM +0100, Philip Armstrong wrote:

On Thu, Jun 21, 2007 at 03:29:17PM -0400, Mark T.B. Carroll wrote:

That's the old wiki. The new one gives the opposite advice! (As does
the ghc manual):

 http://www.haskell.org/ghc/docs/latest/html/users_guide/faster.html
 http://www.haskell.org/haskellwiki/Performance/Floating_Point


Incidentally, the latter page implies that ghc is being overly
pessimistic when compilling FP code without -fexcess-precision:

"On x86 (and other platforms with GHC prior to version 6.4.2), use
 the -fexcess-precision flag to improve performance of floating-point
 intensive code (up to 2x speedups have been seen). This will keep
 more intermediates in registers instead of memory, at the expense of
 occasional differences in results due to unpredictable rounding."

IIRC, it is possible to issue an instruction to the x86 FP unit which
makes all operations work on 64-bit Doubles, even though there are
80-bits available internally. Which then means there's no requirement
to spill intermediate results to memory in order to get the rounding
correct.

For some background on why GHC doesn't do this, see the comment "MORE FLOATINGPOINT MUSINGS..." in


  http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs

The main problem is floats: even if you put the FPU into 64-bit mode, your floatoperations will be done at 64-bit precision. There are other technical problemsthat we found with doing this, the comment above elaborates.

GHC passes -ffloat-store to GCC, unless you give the flag -fexcess-precision.The idea is to try to get reproducible floating-point results. The native codegenerator is unaffected by -fexcess-precision, but it produces rubbishfloating-point code on x86 anyway.

Ideally, -fexcess-precision should just affect whether the FP unit
uses 80 or 64 bit Doubles. It shouldn't make any performance
difference, although obviously the generated results may be different.

As an aside, if you use the -optc-mfpmath=sse option, then you only
get 64-bit Doubles anyway (on x86).

You probably want SSE2. If I ever get around to finishing it, the GHC nativecode generator will be able to generate SSE2 code on x86 someday, like itcurrently does for x86-64. For now, to get good FP performance on x86, youprobably want


  -fvia-C -fexcess-precision -optc-mfpmath=sse2

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

Reply via email to