On 17.07.2008, at 17:42, Ian Lynagh wrote:
On Thu, Jul 17, 2008 at 05:18:01PM +0200, Henning Thielemann wrote:
Complex.magnitude must prevent overflows, that is, if you just square
1e200::Double you get an overflow, although the end result may be also around 1e200. I guess, that to this end Complex.magnitude will separate
mantissa and exponent, but this is done via Integers, I'm afraid.

Here's the code:

{-# SPECIALISE magnitude :: Complex Double -> Double #-}
magnitude :: (RealFloat a) => Complex a -> a
magnitude (x:+y) =  scaleFloat k
(sqrt ((scaleFloat mk x)^(2::Int) + (scaleFloat mk y)^(2::Int)))
                    where k  = max (exponent x) (exponent y)
                          mk = - k

So the slowdown may be due to the scaling, presumably to prevent
overflow as you say. However, the e^(2 :: Int) may also be causing a
slowdown, as (^) is lazy in its first argument; I'm not sure if there is a rule that will rewrite that to e*e. Stefan, perhaps you can try timing
with the above code, and also with:

{-# SPECIALISE magnitude :: Complex Double -> Double #-}
magnitude :: (RealFloat a) => Complex a -> a
magnitude (x:+y) =  scaleFloat k
(sqrt (sqr (scaleFloat mk x) + sqr (scaleFloat mk y)))
                    where k  = max (exponent x) (exponent y)
                          mk = - k
                          sqr x = x * x

and let us know what the results are?

thanks ian, here are the absolute runtimes (non-instrumented code) and the corresponding entries in the profile:

c_magnitude0 (Complex.Data.magnitude)           0m7.249s
c_magnitude1 (non-scaling version)              0m1.176s
c_magnitude2 (scaling version, strict square)   0m3.278s

             %time  %alloc
             (inherited)

c_magnitude0 91.6   90.2
c_magnitude1 41.7   49.6
c_magnitude2 81.5   71.1

interestingly, just pasting the original ghc library implementation seems to
slow things down considerably (0m12.264s) when compiling with

-O2
-funbox-strict-fields
-fvia-C
-optc-O2
-fdicts-cheap
-fno-method-sharing
-fglasgow-exts

when leaving away -fdicts-cheap and -fno-method-sharing the execution time for the pasted library code reduces to 0m6.873s. seems like some options that are useful (or even necessary?) for stream fusion rule reduction, may produce
non-optimal code in other cases?

<sk>

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to