Sadly, this function is pretty close to the real workload. I do n-body
simulations of planetary systems. In this problem, the value of "n" is
small, but the simulation runs for a very long time. So this nested for
loop will be called maybe 10 billion times in the course of one simulation,
and that's where all the CPU time goes.

I already have simulation software that works well enough for this. I just
wanted to experiment with Julia to see if this could be made parallel. An
irritating problem with all the codes that solve planetary systems is that
they are all serial -- this problem is apparently hard to parallelize.


Cheers,
Daniel.



On 17 June 2015 at 17:02, Tim Holy <[email protected]> wrote:

> You're copying a lot of data between processes. Check out SharedArrays.
> But I
> still fear that if each "job" is tiny, you may not get as much benefit
> without
> further restructuring.
>
> I trust that your "real" workload will take more than 1ms. Otherwise, it's
> very unlikely that your experiments in parallel programming will end up
> saving
> you time :-).
>
> --Tim
>
> On Wednesday, June 17, 2015 06:37:28 AM Daniel Carrera wrote:
> > Hi everyone,
> >
> > My adventures with parallel programming with Julia continue. Here is a
> > different issue from other threads: My parallel function is 8300x slower
> > than my serial function even though I am running on 4 processes on a
> > multi-core machine.
> >
> > julia> nprocs()
> > 4
> >
> > I have Julia 0.3.8. Here is my program in its entirety (not very long).
> >
> > function main()
> >
> > nbig::Int16 = 7
> > nbod::Int16 = nbig
> > bod  = Float64[
> >         0       1      2      3      4      5      6  # x position
> >         0       0      0      0      0      0      0  # y position
> >         0       0      0      0      0      0      0  # z position
> >         0       0      0      0      0      0      0  # x velocity
> >         0       0      0      0      0      0      0  # y velocity
> >         0       0      0      0      0      0      0  # z velocity
> >         1       1      1      1      1      1      1  # Mass
> >     ]
> >
> >     a = zeros(3,nbod)
> >
> >     @time for k = 1:1000
> >         gravity_1!(bod, nbig, nbod, a)
> >     end
> >     println(a[1,:])
> >
> >     @time for k = 1:1000
> >         gravity_2!(bod, nbig, nbod, a)
> > end
> >     println(a[1,:])
> > end
> >
> > function gravity_1!(bod, nbig, nbod, a)
> >
> >     for i = 1:nbod
> >         a[1,i] = 0.0
> >         a[2,i] = 0.0
> >         a[3,i] = 0.0
> >     end
> >
> >     @inbounds for i = 1:nbig
> >         for j = (i + 1):nbod
> >
> >             dx = bod[1,j] - bod[1,i]
> >             dy = bod[2,j] - bod[2,i]
> >             dz = bod[3,j] - bod[3,i]
> >
> >             s_1 = 1.0 / sqrt(dx*dx+dy*dy+dz*dz)
> >             s_3 = s_1 * s_1 * s_1
> >
> >             tmp1 = s_3 * bod[7,i]
> >             tmp2 = s_3 * bod[7,j]
> >
> >             a[1,j] = a[1,j] - tmp1*dx
> >             a[2,j] = a[2,j] - tmp1*dy
> >             a[3,j] = a[3,j] - tmp1*dz
> >
> >             a[1,i] = a[1,i] + tmp2*dx
> >             a[2,i] = a[2,i] + tmp2*dy
> >             a[3,i] = a[3,i] + tmp2*dz
> >         end
> >     end
> >     return a
> > end
> >
> > function gravity_2!(bod, nbig, nbod, a)
> >
> >     for i = 1:nbod
> >         a[1,i] = 0.0
> >         a[2,i] = 0.0
> >         a[3,i] = 0.0
> >     end
> >
> >     @inbounds @sync @parallel for i = 1:nbig
> >         for j = (i + 1):nbod
> >
> >             dx = bod[1,j] - bod[1,i]
> >             dy = bod[2,j] - bod[2,i]
> >             dz = bod[3,j] - bod[3,i]
> >
> >             s_1 = 1.0 / sqrt(dx*dx+dy*dy+dz*dz)
> >             s_3 = s_1 * s_1 * s_1
> >
> >             tmp1 = s_3 * bod[7,i]
> >             tmp2 = s_3 * bod[7,j]
> >
> >             a[1,j] = a[1,j] - tmp1*dx
> >             a[2,j] = a[2,j] - tmp1*dy
> >             a[3,j] = a[3,j] - tmp1*dz
> >
> >             a[1,i] = a[1,i] + tmp2*dx
> >             a[2,i] = a[2,i] + tmp2*dy
> >             a[3,i] = a[3,i] + tmp2*dz
> >         end
> >     end
> >     return a
> > end
> >
> >
> >
> > So this is a straight forward N-body gravity calculation. Yes, I realize
> > that gravity_2!() is wrong, but that's fine. Right now I'm just talking
> > about the CPU time. When I run this on my computer I get:
> >
> > julia> main()
> > elapsed time: 0.000475294 seconds (0 bytes allocated)
> > [1.4913888888888889 0.4636111111111111 0.1736111111111111
> > -5.551115123125783e-17 -0.17361111111111116 -0.4636111111111112
> > -1.4913888888888889]
> > elapsed time: 3.953546654 seconds (126156320 bytes allocated, 13.49% gc
> > time)
> > [0.0 0.0 0.0 0.0 0.0 0.0 0.0]
> >
> >
> > So, the serial version takes 0.000475 seconds and the parallel takes 3.95
> > seconds. Furthermore, the parallel version is calling the garbage
> > collector. I suspect that the problem has something to do with the memory
> > access. Maybe the parallel code is wasting a lot of time copying
> variables
> > in memory. But whatever the reason, this is bad. The documentation says
> > that @parallel is supposed to be fast, even for very small loops, but
> > that's not what I'm seeing. A non-buggy implementation will be even
> slower.
> >
> > Have I missed something? Is there an obvious error in how I'm using the
> > parallel constructs?
> >
> > I would appreciate any guidance you may offer.
> >
> > Cheers,
> > Daniel.
>
>


-- 
When an engineer says that something can't be done, it's a code phrase that
means it's not fun to do.

Reply via email to