I sat down tonight and did tons of good learning (which was my goal). Yes,
the variable names in the unrolling is a little "ugly" but it helps to read
the C++ version for context. There are two for loops (advN is each inner one
unrolled). the other function names match the C++ version.  It was my goal
to implement an unrolled version of that.

Fortunately, my performance is excellent now. It is only 4x slower than the
C++ version and 2x slower than the Haskell one listed (which uses pointer
trickery). I am sure there could be more done but I am at my limit of
comprehension. But if I may guess, I would say that any speed issues now are
related to a lack of in place updating for variables and structures.

I'd also like to thank everyone for their help so far. I have attached my
latest version.

--ryan

On Nov 27, 2007 7:14 PM, Sterling Clover <[EMAIL PROTECTED]> wrote:

> The first step would be profiling -- i.e. compiling with -prof -auto-
> all to tag each function as a cost center, then running with +RTS -p
> to generate a cost profile. The problem here is you've got massive
> amounts of unrolling done already, so it's sort of hard to figure out
> what's doing  what, and the names you've given the unrolled functions
> are... less than helpful. (first rule of optimization: optimize
> later.)  The use of tuples shouldn't be a problem per se in terms of
> performance, but it really hurts readability to lack clear type
> signatures and types. You'd probably be better off constructing a
> vector data type as does the current Haskell entry -- and by forcing
> it to be strict and unboxed (you have nearly no strictness
> annotations I note -- and recall that $! only evaluates its argument
> to weak head normal form, which means that you're just checking if
> the top-level constructor is _|_) you'll probably get better
> performance to boot. In any case, declaring type aliases for the
> various units you're using would also help readability quite a bit.
>
> --S
>
> On Nov 27, 2007, at 5:41 PM, Ryan Dickie wrote:
>
> > I thought it would be a nice exercise (and a good learning
> > experience) to try and solve the nbody problem from the debian
> > language shootout. Unfortunately, my code sucks. There is a massive
> > space leak and performance is even worse. On the bright side, my
> > implementation is purely functional. My twist: I manually unrolled
> > a few loops from the C++ version.
> >
> > I believe that most of my performance problems stem from my abuse
> > of tuple. The bodies are passed as a tuple of planets, a planet is
> > a tuple of (position, velocity, mass) and the vectors position and
> > velocity are also tuples of type double. My lame justification for
> > that is to make it nice and handy to pass data around.
> >
> > Any tips would be greatly appreciated.
> >
> > --ryan
> > <nbody3.hs>
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe@haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
{-# OPTIONS_GHC -O2 -fbang-patterns -funbox-strict-fields #-}
{--
	The Great Computer Language Shootout
	http://shootout.alioth.debian.org/

	N-body problem

	C version contributed by Christoph Bauer
	converted to C++ and modified by Paul Kitchin
	Haskell version by Ryan Dickie based on the above two
	With great help from ddarius, Spencer Janssen, and #haskell
--}
import System
import Text.Printf

data Vector3 = V !Double !Double !Double

V a b c .+ V x y z = V (a+x) (b+y) (c+z)
V a b c .- V x y z = V (a-x) (b-y) (c-z)
x .* V a b c = V (x*a) (x*b) (x*c)
mag2 !(V x y z) = x*x + y*y + z*z

--some getter functions
pVel !(_,vel,_) = vel
pPos !(pos,_,_) = pos
pMass !(!_,!_,!mass) = mass

days_per_year = 365.24::Double
solar_mass = (4::Double) * pi * pi
delta_time = 0.01::Double

planets = ( p1,p2,p3,p4,p5 ) where
	p1 = ( V 0 0 0, V 0 0 0, solar_mass )
	p2 = ( p2pos,p2vel,p2mass )
	p3 = ( p3pos,p3vel,p3mass )
	p4 = ( p4pos,p4vel,p4mass )
	p5 = ( p5pos,p5vel,p5mass )
	p2pos =  V 4.84143144246472090e+00 (-1.16032004402742839e+00) (-1.03622044471123109e-01)
	p2vel = days_per_year .* V 1.66007664274403694e-03 7.69901118419740425e-03 (-6.90460016972063023e-05)
	p2mass = 9.54791938424326609e-04 * solar_mass
	p3pos = V 8.34336671824457987e+00 4.12479856412430479e+00 (-4.03523417114321381e-01)
	p3vel = days_per_year .* V (-2.76742510726862411e-03) 4.99852801234917238e-03 2.30417297573763929e-05
	p3mass = 2.85885980666130812e-04 * solar_mass
	p4pos = V 1.28943695621391310e+01 (-1.51111514016986312e+01) (-2.23307578892655734e-01)
	p4vel = days_per_year .* V 2.96460137564761618e-03 2.37847173959480950e-03 (-2.96589568540237556e-05)
	p4mass = 4.36624404335156298e-05 * solar_mass
	p5pos = V 1.53796971148509165e+01 (-2.59193146099879641e+01) 1.79258772950371181e-01
	p5vel = days_per_year .* V 2.68067772490389322e-03 1.62824170038242295e-03 (-9.51592254519715870e-05)
	p5mass = 5.15138902046611451e-05 * solar_mass

update (!p1,!p2,!p3,!p4,!p5) = ( up p1,up p2,up p3,up p4,up p5 ) where
	up (!pos,!vel,!mass) = ( (pos .+ (delta_time .* vel)),vel,mass )

offset_momentum (!p1,p2,p3,p4,p5) = ( pp1,p2,p3,p4,p5 ) where
		pp1 = ( pPos p1,ppvel,pMass p1 ) 
		ppvel =  (-1.0 / solar_mass) .* momentum
		momentum = mul p2 .+ mul p3 .+ mul p4 .+ mul p5
		mul (!_,vel,mass) = mass .* vel

--trick here is to unroll each loop.. based on the c++ version
--advance !ps = update $! adv4 $! adv3 $! adv2 $! adv1 ps
advance = update . adv4 . adv3 . adv2 . adv1


adv4 (!p1,p2,p3,p4,p5) = ( p1,p2,p3,pp4,pp5 ) where
	(pp4,pp5) = innerLoop p4 p5

adv3 (!p1,p2,p3,p4,p5) = ( p1,p2,pp3,pp4,pp5 ) where
	(tp3,pp4) = innerLoop p3 p4
	(pp3,pp5) = innerLoop tp3 p5

adv2 (!p1,p2,p3,p4,p5) = ( p1,pp2,pp3,pp4,pp5 ) where
	(t1p2,pp3) = innerLoop p2 p3
	(t2p2,pp4) = innerLoop t1p2 p4
	(pp2,pp5) = innerLoop t2p2 p5
	
adv1 (!p1,p2,p3,p4,p5) = ( pp1,pp2,pp3,pp4,pp5 ) where
	(t1p1,pp2) = innerLoop p1 p2
	(t2p1,pp3) = innerLoop t1p1 p3
	(t3p1,pp4) = innerLoop t2p1 p4
	(pp1,pp5) = innerLoop t3p1 p5

innerLoop !p1 !p2 = ( pp1,pp2 ) where
	!difference = (pPos p1) .- (pPos p2)
	!distance_squared = mag2 difference
	!distance = sqrt distance_squared
	!magnitude = delta_time / (distance * distance_squared)
	!planet2_mass_magnitude = (pMass p2) * magnitude
	!planet1_mass_magnitude = (pMass p1) * magnitude
	!pp1 = ( pPos p1, (pVel p1) .- (planet2_mass_magnitude .* difference), pMass p1 )
	!pp2 = ( pPos p2, (pVel p2) .+ (planet1_mass_magnitude .* difference), pMass p2 )


energy (!p1,p2,p3,p4,p5) = sum2 where
	sum2 = loop5 + loop4 + loop3 + loop2 + loop1
	loop0 (!pos,!vel,!mass) = (0.5::Double) * mass * (mag2 vel)
	loop5 = (loop0 p5)
	loop4 = (loop0 p4) - (delE p4 p5)
	loop3 = (loop0 p3) - (delE p3 p5) - (delE p3 p4)
	loop2 = (loop0 p2) - (delE p2 p5) - (delE p2 p4) - (delE p2 p3)
	loop1 = (loop0 p1) - (delE p1 p5) - (delE p1 p4) - (delE p1 p3) - (delE p1 p2)
	delE (pos,_,mass) (pos2,_,mass2) = (mass * mass2) / (sqrt $! mag2 (pos .- pos2))

runIt 0 args = args
runIt cnt args = runIt (cnt-1) $! advance args

main :: IO()
main = do
	n <- getArgs >>= readIO.head
	let ps = offset_momentum planets
	printf "%.9f\n" $ energy ps
	let results = runIt n ps
	printf "%.9f\n" $ energy results

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to