On 17/02/2013, at 3:23 PM, john skaller wrote:

> So now, the clue to the problem in the optimiser. ..

So only one mod makes any difference:

proc Advance(dt: double){
   var i: int; var j:int;
   for i in 0 upto nbodies - 1 do
      val pi = &bodies.i;
      val bimass = pi*.mass;
      var delta = 0.0,0.0,0.0;
      for j in i+1 upto nbodies - 1 do
         val pj = &bodies.j;
         val bjmass = pj*.mass;
         val d = pi*.pos - pj*.pos;
         val d2 = sqr d;
         val distance = sqrt d2;

         delta = delta + d * bjmass * dt / (distance * d2);
         pj.vel <- pj*.vel + d * bimass * dt / (distance * d2);
      done;
      pi.vel <- pi*.vel - delta;
   done;

   for i in 0 upto nbodies - 1 do
      pi = &bodies.i;
      pi.pos <- pi*.pos + dt * pi*.vel;
   done;
}

This is 23 seconds, exactly the same as the original routine EXCEPT
that I factored out the common expression:

      val pi = &bodies.i;
         val pj = &bodies.j;

(which the C code does too).

So I'm pretty sure what's happening now (without examining the
generated C++). When we calculate something like this:

      bodies.i.vel = bodies.i.vel - delta;

It's grabbing the whole of bodies . i and then grabbing
the velocity component of that .. instead of just grabbing the
velocity component of the i'th planet. And since this calculation
splits in 3 for the x,y,z components, it's probably grabbing
the whole velocity then getting the x component, the whole
delta, then the x component, and subtracting. Repeat for
y and z.

In other words given this kind of thing:

        var a = 1,2,3;
        var x = a.0; var y = a.1; var z = a.2;

it's effectively doing this:

        x = (1,2,3) . 0; y = (1,2,3) . 1; z = (1,2,3).2;

Actually in the calculations we're doing maths, so 

        v = v1 + v2 + v3

is basically

        v.0 =  (v1.0 + v2.0, v1.1 + v2.1, v1.2 + v2.3).0 + ...
                (...)

instead of parallelising the whole calculation.

This is not just 

        tmp = v1 + v2;
        v = tmp + v2;

because if it does partial parallelisation by substitution the whole
calculation is repeated an exponential number of times.

Felix should handle this for tuples and arrays. It may have more trouble
for structure members. This is because struct's are constructed by
first making a tuple and then casting it. So it's hard to see "inside"
the cast.

For example given

        struct X { int a; int b; int c; };
        var x = X(1,2,3).a;

should reduce to 

        x = 1;

without needing to actually make the struct X temporary,
but it probably doesn't because it's building a tuple,
casting it to a struct, then extracting the component a,
and the correspondence of a with the first value of the tuple
is not direct.


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
The Go Parallel Website, sponsored by Intel - in partnership with Geeknet, 
is your hub for all things parallel software development, from weekly thought 
leadership blogs to news, videos, case studies, tutorials, tech docs, 
whitepapers, evaluation guides, and opinion stories. Check out the most 
recent posts - join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to