#1138: The -fexcess-precision flag is ignored if supplied on the command line.
----------------------------------------+-----------------------------------
 Reporter:  dons                        |          Owner:             
     Type:  bug                         |         Status:  new        
 Priority:  normal                      |      Milestone:             
Component:  Driver                      |        Version:  6.6        
 Severity:  normal                      |     Resolution:             
 Keywords:  numerics, excess-precision  |     Difficulty:  Easy (1 hr)
 Testcase:                              |   Architecture:  x86        
       Os:  Unknown                     |  
----------------------------------------+-----------------------------------
Old description:

> The numerics/Double-based programs on the great language shootout were
> performing poorly. Investigations revealed that the -fexcess-precision
> flag was being silently ignored by GHC when supplied as a command line
> flag. If it is supplied as a {-# OPTIONS -fexcess-precision #-} pragma,
> it is respected.
>
> Consider the following shootout entry for the 'mandelbrot' benchmark. It
> writes the mandelbrot set as bmp format to stdout.
>
> {{{
> import System
> import System.IO
> import Foreign
> import Foreign.Marshal.Array
>
> main = do
>     w <- getArgs >>= readIO . head
>     let n      = w `div` 8
>         m  = 2 / fromIntegral w
>     putStrLn ("P4\n"++show w++" "++show w)
>     p <- mallocArray0 n
>     unfold n (next_x w m n) p (T 1 0 0 (-1))
>
> unfold :: Int -> (T -> Maybe (Word8,T)) -> Ptr Word8 -> T -> IO ()
> unfold !i !f !ptr !x0 = loop x0
>   where
>     loop !x = go ptr 0 x
>
>     go !p !n !x = case f x of
>         Just (w,y) | n /= i -> poke p w >> go (p `plusPtr` 1) (n+1) y
>         Nothing             -> hPutBuf stdout ptr i
>         _                   -> hPutBuf stdout ptr i >> loop x
> {-# NOINLINE unfold #-}
>
> data T = T !Int !Int !Int !Double
>
> next_x !w !iw !bw (T bx x y ci)
>     | y  == w   = Nothing
>     | bx == bw  = Just (loop_x w x 8 iw ci 0, T 1 0    (y+1)   (iw+ci))
>     | otherwise = Just (loop_x w x 8 iw ci 0, T (bx+1) (x+8) y ci)
>
> loop_x !w !x !n !iw !ci !b
>     | x < w = if n == 0
>                     then b
>                     else loop_x w (x+1) (n-1) iw ci (b+b+v)
>     | otherwise = b `shiftL` n
>   where
>     v = fractal 0 0 (fromIntegral x * iw - 1.5) ci 50
>
> fractal :: Double -> Double -> Double -> Double -> Int -> Word8
> fractal !r !i !cr !ci !k
>     | r2 + i2 > 4 = 0
>     | k == 0      = 1
>     | otherwise   = fractal (r2-i2+cr) ((r+r)*i+ci) cr ci (k-1)
>   where
>     (!r2,!i2) = (r*r,i*i)
> }}}
>
> We can compile and run this as follows:
>
> {{{
> $ ghc -O -fglasgow-exts -optc-march=pentium4 -fbang-patterns -funbox-
> strict-fields -optc-O2 -optc-mfpmath=sse -optc-msse2 -fexcess-precision
> -o m1 mandel3.hs -no-recomp
>
> $ time ./m1 3000 > /dev/null
> ./m1 3000 > /dev/null  8.12s user 0.00s system 99% cpu 8.143 total
> }}}
>
> 8s is around 3x the speed of C (or worse).
>
> now, if we add the following pragma to the top of the file:
>
> {{{
> {-# OPTIONS -fexcess-precision #-}
> }}}
>
> and recompile and rerun:
>
> {{{
> $ ghc -O -fglasgow-exts -optc-march=pentium4 -fbang-patterns -funbox-
> strict-fields -optc-O2 -optc-mfpmath=sse -optc-msse2 -fexcess-precision
> -o m1 mandel3.hs -no-recomp
>
> $ time ./m1 3000 > /dev/null
> ./m1 3000 > /dev/null  2.94s user 0.00s system 99% cpu 2.945 total
> }}}
>
> Nearly 3x faster, and competitive with C.
>
> Across the board the -fexcess-precision flag seems to be ignored by GHC,
> affecting all Double-based entries on the shootout.
>
> A diff on the ghc -v3 output shows that -ffloat-store is not being passed
> to GCC when -fexcess-precision is supplied on the command line.

New description:

 The numerics/Double-based programs on the great language shootout were
 performing poorly. Investigations revealed that the -fexcess-precision
 flag was being silently ignored by GHC when supplied as a command line
 flag. If it is supplied as a {-# OPTIONS -fexcess-precision #-} pragma, it
 is respected.

 Consider the following shootout entry for the 'mandelbrot' benchmark. It
 writes the mandelbrot set as bmp format to stdout.

 {{{
 import System
 import System.IO
 import Foreign
 import Foreign.Marshal.Array

 main = do
     w <- getArgs >>= readIO . head
     let n      = w `div` 8
         m  = 2 / fromIntegral w
     putStrLn ("P4\n"++show w++" "++show w)
     p <- mallocArray0 n

     unfold n (next_x w m n) p (T 1 0 0 (-1))

 unfold :: Int -> (T -> Maybe (Word8,T)) -> Ptr Word8 -> T -> IO ()
 unfold !i !f !ptr !x0 = loop x0
   where
     loop !x = go ptr 0 x

     go !p !n !x = case f x of
         Just (w,y) | n /= i -> poke p w >> go (p `plusPtr` 1) (n+1) y
         Nothing             -> hPutBuf stdout ptr i
         _                   -> hPutBuf stdout ptr i >> loop x
 {-# NOINLINE unfold #-}

 data T = T !Int !Int !Int !Double

 next_x !w !iw !bw (T bx x y ci)
     | y  == w   = Nothing
     | bx == bw  = Just (loop_x w x 8 iw ci 0, T 1 0    (y+1)   (iw+ci))
     | otherwise = Just (loop_x w x 8 iw ci 0, T (bx+1) (x+8) y ci)

 loop_x !w !x !n !iw !ci !b
     | x < w = if n == 0
                     then b
                     else loop_x w (x+1) (n-1) iw ci (b+b+v)
     | otherwise = b `shiftL` n
   where
     v = fractal 0 0 (fromIntegral x * iw - 1.5) ci 50

 fractal :: Double -> Double -> Double -> Double -> Int -> Word8
 fractal !r !i !cr !ci !k
     | r2 + i2 > 4 = 0
     | k == 0      = 1
     | otherwise   = fractal (r2-i2+cr) ((r+r)*i+ci) cr ci (k-1)
   where
     (!r2,!i2) = (r*r,i*i)
 }}}

 We can compile and run this as follows:

 {{{
 $ ghc -O -fglasgow-exts -optc-march=pentium4 -fbang-patterns -funbox-
 strict-fields -optc-O2 -optc-mfpmath=sse -optc-msse2 -fexcess-precision -o
 m1 mandel3.hs -no-recomp

 $ time ./m1 3000 > /dev/null
 ./m1 3000 > /dev/null  8.12s user 0.00s system 99% cpu 8.143 total
 }}}

 8s is around 3x the speed of C (or worse).

 now, if we add the following pragma to the top of the file:

 {{{
 {-# OPTIONS -fexcess-precision #-}
 }}}

 and recompile and rerun:

 {{{
 $ ghc -O -fglasgow-exts -optc-march=pentium4 -fbang-patterns -funbox-
 strict-fields -optc-O2 -optc-mfpmath=sse -optc-msse2 -fexcess-precision -o
 m1 mandel3.hs -no-recomp

 $ time ./m1 3000 > /dev/null
 ./m1 3000 > /dev/null  2.94s user 0.00s system 99% cpu 2.945 total
 }}}

 Nearly 3x faster, and competitive with C.

 Across the board the -fexcess-precision flag seems to be ignored by GHC,
 affecting all Double-based entries on the shootout.

 A diff on the ghc -v3 output shows that -ffloat-store is not being passed
 to GCC when -fexcess-precision is supplied on the command line.

Comment (by dons):

 A smaller example:

 {{{
 import Text.Printf

 main = go (1/3) 3 1

 go :: Double -> Double -> Int -> IO ()
 go !x !y !i
     | i == 100000000 = printf "%f\n" (x+y)
     | otherwise      = go (x*y/3) (x*9) (i+1)
 }}}

 This program, run with the following flags:

 {{{
 $ ghc -O -fexcess-precision -fbang-patterns -optc-O -optc-ffast-math
 -optc-mfpmath=sse -optc-msse2 A.hs -o a
 }}}

 Runs in:

 {{{
 $ time ./a
 3.3333333333333335
 ./a  4.23s user 0.01s system 97% cpu 4.350 total
 }}}

 If we then move -fexcess-precision into the file, as a pragma:

 {{{
 $ time ./a
 3.3333333333333335
 ./a  0.91s user 0.00s system 99% cpu 0.908 total
 }}}

 Note that asking GCC to generate sse instructions makes a 10% or better
 improvment too.

 For reference, this C program:

 {{{
 #include <stdio.h>

 int main()
 {
     double x = 1.0/3.0;
     double y = 3.0;
     int i    = 1;
     for (; i<=100000000; i++) {
         x = x*y/3.0;
         y = x*9.0;
     }
     printf("%f\n", x+y);
     return 0;
 }

 }}}

 {{{
 $ gcc -O3 -ffast-math -mfpmath=sse -msse2 t.c -o a.out -std=c99
 $ time ./a.out
 3.333333
 ./a.out  1.00s user 0.00s system 98% cpu 1.012 total
 }}}

 Which is pretty nice for GHC :-)

 But now I wonder, how much of the bad numerics press has been soley due to
 -fexcess-precision being ignored?

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/1138>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to