My current best try uses the uvector package, has two 'vectors' of type
(UArr Double)  as input, and relies on the sumU and zipWithU functions
which use streaming to compute the result:

dist_fast :: UArr Double -> UArr Double -> Double
dist_fast p1 p2 = sumDs `seq` sqrt sumDs
        where
                sumDs         = sumU ds
                ds            = zipWithU euclidean p1 p2
                euclidean x y = d*d
                        where
                                d = x-y

You'll probably want to make sure that 'euclidian' is specialized to
the types you need (here 'Double'), not used overloaded for 'Num a=>a'
(check -ddump-tc, or -ddump-simpl output).

After that, unrolling the fused fold loop (uvector internal) might help
a bit, but isn't there yet:

http://hackage.haskell.org/trac/ghc/ticket/3123
http://hackage.haskell.org/trac/ghc/wiki/Inlining

And even if that gets implemented, it doesn't apply directly to your
case, where the loop is in a library, but you might want to control its
unrolling in your client code. Having the loop unrolled by a default
factor (8x or so) should help for loops like this, with little computation.

Claus


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to