I sat down tonight and did tons of good learning (which was my goal). Yes, the variable names in the unrolling is a little "ugly" but it helps to read the C++ version for context. There are two for loops (advN is each inner one unrolled). the other function names match the C++ version. It was my goal to implement an unrolled version of that.
Fortunately, my performance is excellent now. It is only 4x slower than the C++ version and 2x slower than the Haskell one listed (which uses pointer trickery). I am sure there could be more done but I am at my limit of comprehension. But if I may guess, I would say that any speed issues now are related to a lack of in place updating for variables and structures. I'd also like to thank everyone for their help so far. I have attached my latest version. --ryan On Nov 27, 2007 7:14 PM, Sterling Clover <[EMAIL PROTECTED]> wrote: > The first step would be profiling -- i.e. compiling with -prof -auto- > all to tag each function as a cost center, then running with +RTS -p > to generate a cost profile. The problem here is you've got massive > amounts of unrolling done already, so it's sort of hard to figure out > what's doing what, and the names you've given the unrolled functions > are... less than helpful. (first rule of optimization: optimize > later.) The use of tuples shouldn't be a problem per se in terms of > performance, but it really hurts readability to lack clear type > signatures and types. You'd probably be better off constructing a > vector data type as does the current Haskell entry -- and by forcing > it to be strict and unboxed (you have nearly no strictness > annotations I note -- and recall that $! only evaluates its argument > to weak head normal form, which means that you're just checking if > the top-level constructor is _|_) you'll probably get better > performance to boot. In any case, declaring type aliases for the > various units you're using would also help readability quite a bit. > > --S > > On Nov 27, 2007, at 5:41 PM, Ryan Dickie wrote: > > > I thought it would be a nice exercise (and a good learning > > experience) to try and solve the nbody problem from the debian > > language shootout. Unfortunately, my code sucks. There is a massive > > space leak and performance is even worse. On the bright side, my > > implementation is purely functional. My twist: I manually unrolled > > a few loops from the C++ version. > > > > I believe that most of my performance problems stem from my abuse > > of tuple. The bodies are passed as a tuple of planets, a planet is > > a tuple of (position, velocity, mass) and the vectors position and > > velocity are also tuples of type double. My lame justification for > > that is to make it nice and handy to pass data around. > > > > Any tips would be greatly appreciated. > > > > --ryan > > <nbody3.hs> > > _______________________________________________ > > Haskell-Cafe mailing list > > Haskell-Cafe@haskell.org > > http://www.haskell.org/mailman/listinfo/haskell-cafe > >
{-# OPTIONS_GHC -O2 -fbang-patterns -funbox-strict-fields #-} {-- The Great Computer Language Shootout http://shootout.alioth.debian.org/ N-body problem C version contributed by Christoph Bauer converted to C++ and modified by Paul Kitchin Haskell version by Ryan Dickie based on the above two With great help from ddarius, Spencer Janssen, and #haskell --} import System import Text.Printf data Vector3 = V !Double !Double !Double V a b c .+ V x y z = V (a+x) (b+y) (c+z) V a b c .- V x y z = V (a-x) (b-y) (c-z) x .* V a b c = V (x*a) (x*b) (x*c) mag2 !(V x y z) = x*x + y*y + z*z --some getter functions pVel !(_,vel,_) = vel pPos !(pos,_,_) = pos pMass !(!_,!_,!mass) = mass days_per_year = 365.24::Double solar_mass = (4::Double) * pi * pi delta_time = 0.01::Double planets = ( p1,p2,p3,p4,p5 ) where p1 = ( V 0 0 0, V 0 0 0, solar_mass ) p2 = ( p2pos,p2vel,p2mass ) p3 = ( p3pos,p3vel,p3mass ) p4 = ( p4pos,p4vel,p4mass ) p5 = ( p5pos,p5vel,p5mass ) p2pos = V 4.84143144246472090e+00 (-1.16032004402742839e+00) (-1.03622044471123109e-01) p2vel = days_per_year .* V 1.66007664274403694e-03 7.69901118419740425e-03 (-6.90460016972063023e-05) p2mass = 9.54791938424326609e-04 * solar_mass p3pos = V 8.34336671824457987e+00 4.12479856412430479e+00 (-4.03523417114321381e-01) p3vel = days_per_year .* V (-2.76742510726862411e-03) 4.99852801234917238e-03 2.30417297573763929e-05 p3mass = 2.85885980666130812e-04 * solar_mass p4pos = V 1.28943695621391310e+01 (-1.51111514016986312e+01) (-2.23307578892655734e-01) p4vel = days_per_year .* V 2.96460137564761618e-03 2.37847173959480950e-03 (-2.96589568540237556e-05) p4mass = 4.36624404335156298e-05 * solar_mass p5pos = V 1.53796971148509165e+01 (-2.59193146099879641e+01) 1.79258772950371181e-01 p5vel = days_per_year .* V 2.68067772490389322e-03 1.62824170038242295e-03 (-9.51592254519715870e-05) p5mass = 5.15138902046611451e-05 * solar_mass update (!p1,!p2,!p3,!p4,!p5) = ( up p1,up p2,up p3,up p4,up p5 ) where up (!pos,!vel,!mass) = ( (pos .+ (delta_time .* vel)),vel,mass ) offset_momentum (!p1,p2,p3,p4,p5) = ( pp1,p2,p3,p4,p5 ) where pp1 = ( pPos p1,ppvel,pMass p1 ) ppvel = (-1.0 / solar_mass) .* momentum momentum = mul p2 .+ mul p3 .+ mul p4 .+ mul p5 mul (!_,vel,mass) = mass .* vel --trick here is to unroll each loop.. based on the c++ version --advance !ps = update $! adv4 $! adv3 $! adv2 $! adv1 ps advance = update . adv4 . adv3 . adv2 . adv1 adv4 (!p1,p2,p3,p4,p5) = ( p1,p2,p3,pp4,pp5 ) where (pp4,pp5) = innerLoop p4 p5 adv3 (!p1,p2,p3,p4,p5) = ( p1,p2,pp3,pp4,pp5 ) where (tp3,pp4) = innerLoop p3 p4 (pp3,pp5) = innerLoop tp3 p5 adv2 (!p1,p2,p3,p4,p5) = ( p1,pp2,pp3,pp4,pp5 ) where (t1p2,pp3) = innerLoop p2 p3 (t2p2,pp4) = innerLoop t1p2 p4 (pp2,pp5) = innerLoop t2p2 p5 adv1 (!p1,p2,p3,p4,p5) = ( pp1,pp2,pp3,pp4,pp5 ) where (t1p1,pp2) = innerLoop p1 p2 (t2p1,pp3) = innerLoop t1p1 p3 (t3p1,pp4) = innerLoop t2p1 p4 (pp1,pp5) = innerLoop t3p1 p5 innerLoop !p1 !p2 = ( pp1,pp2 ) where !difference = (pPos p1) .- (pPos p2) !distance_squared = mag2 difference !distance = sqrt distance_squared !magnitude = delta_time / (distance * distance_squared) !planet2_mass_magnitude = (pMass p2) * magnitude !planet1_mass_magnitude = (pMass p1) * magnitude !pp1 = ( pPos p1, (pVel p1) .- (planet2_mass_magnitude .* difference), pMass p1 ) !pp2 = ( pPos p2, (pVel p2) .+ (planet1_mass_magnitude .* difference), pMass p2 ) energy (!p1,p2,p3,p4,p5) = sum2 where sum2 = loop5 + loop4 + loop3 + loop2 + loop1 loop0 (!pos,!vel,!mass) = (0.5::Double) * mass * (mag2 vel) loop5 = (loop0 p5) loop4 = (loop0 p4) - (delE p4 p5) loop3 = (loop0 p3) - (delE p3 p5) - (delE p3 p4) loop2 = (loop0 p2) - (delE p2 p5) - (delE p2 p4) - (delE p2 p3) loop1 = (loop0 p1) - (delE p1 p5) - (delE p1 p4) - (delE p1 p3) - (delE p1 p2) delE (pos,_,mass) (pos2,_,mass2) = (mass * mass2) / (sqrt $! mag2 (pos .- pos2)) runIt 0 args = args runIt cnt args = runIt (cnt-1) $! advance args main :: IO() main = do n <- getArgs >>= readIO.head let ps = offset_momentum planets printf "%.9f\n" $ energy ps let results = runIt n ps printf "%.9f\n" $ energy results
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe