#594: Support use of SSE2 in the x86 native code genreator
-----------------------------------------+----------------------------------
  Reporter:  simonmar                    |          Owner:  simonmar        
      Type:  task                        |         Status:  closed          
  Priority:  normal                      |      Milestone:  6.14.1          
 Component:  Compiler (NCG)              |        Version:  6.4.1           
Resolution:  fixed                       |       Keywords:                  
Difficulty:  Moderate (less than a day)  |             Os:  Unknown/Multiple
  Testcase:  N/A                         |   Architecture:  Unknown/Multiple
   Failure:  Runtime performance bug     |  
-----------------------------------------+----------------------------------
Changes (by simonmar):

  * status:  assigned => closed
  * resolution:  => fixed


Comment:

 Done:

 {{{
 Thu Feb  4 10:48:49 GMT 2010  Simon Marlow <[email protected]>
   * Implement SSE2 floating-point support in the x86 native code generator
 (#594)

   The new flag -msse2 enables code generation for SSE2 on x86.  It
   results in substantially faster floating-point performance; the main
   reason for doing this was that our x87 code generation is appallingly
   bad, and since we plan to drop -fvia-C soon, we need a way to generate
   half-decent floating-point code.

   The catch is that SSE2 is only available on CPUs that support it (P4+,
   AMD K8+).  We'll have to think hard about whether we should enable it
   by default for the libraries we ship.  In the meantime, at least
   -msse2 should be an acceptable replacement for "-fvia-C
   -optc-ffast-math -fexcess-precision".

   SSE2 also has the advantage of performing all operations at the
   correct precision, so floating-point results are consistent with other
   platforms.

   I also tweaked the x87 code generation a bit while I was here, now
   it's slighlty less bad than before.
 }}}

 I measured the FF ray tracer benchmark, and `-msse2` seems on par with, or
 possibly better than, `"-fvia-C -optc-O3 -fexcess-precision -ffast-math"`,
 although the results are quite variable on the machine I tried it on.  I
 suspect we're suffering from randomly misaligned Doubles on the stack and
 heap.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/594#comment:12>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to