Hey Eric,

The addprocs() is indeed in the wrong place, I must have made a mistake 
copy-pasting the example.

The overhead in copying A and v is obviously very naive and suboptimal as I 
pointed out. The real implementation I'm focused on is much more efficient 
in this regard.
The problem I wanted to point out isn't regarding this however. The @time 
on the workers actually runs after all communication and serialization work 
is finished and should time just the actual execution.
At least this is the behavior I expect/hope it has :) The issue is then 
that this runtime of the actual execution (without overhead) is 2x larger 
somehow.

Tom

On Thursday, November 27, 2014 6:06:06 PM UTC+1, Erik Schnetter wrote:
>
> You need to use addprocs before the first @everywhere. I assume you 
> actually did this, since otherwise you'd have received an error. 
>
> It seems that your variable A and v are stored on the master, not on 
> the workers. Since they are inputs to do_stuff, they need to be copied 
> there every time. Note that the whole array v is copied although only 
> part of it is accessed. Maybe sending data to 4 processes 
> simultaneously has an overhead, and is somehow much slower than 
> sending the data one at a time. 
>
> To check whether this is true, you can add a loop within do_stuff to 
> execute the routine multiple times. This would increase the workload, 
> but keep the communication overhead the same. 
>
> If this is the case, then to remedy this, you would store A and v 
> (better: only the necessary part of v) on all processes instead of 
> copying them. 
>
> -erik 
>
>
> On Thu, Nov 27, 2014 at 10:20 AM,  <[email protected] <javascript:>> 
> wrote: 
> > Hey everyone, 
> > 
> > I've been looking at parallel programming in julia and was getting some 
> very 
> > unexpected results and rather bad performance because of this. 
> > Sadly I ran out of ideas of what could be going on, disproving all ideas 
> I 
> > had. Hence this post :) 
> > 
> > I was able to construct a similar (simpler) example which exhibits the 
> same 
> > behavior (attached file). 
> > The example is a very naive and suboptimal implementation in many ways 
> (the 
> > actual code is much more optimal), but that's not the issue. 
> > 
> > The issue I'm trying to investigate is the big difference in worker time 
> > when a single worker is active and when multiple are active. 
> > 
> > Ideas I disproved: 
> >   - julia processes pinned to a single core 
> >   - julia process uses multiple threads to do the work, and processes 
> are 
> > fighting for the cores 
> >   - not enough cores on the machine (there are plenty) 
> >   - htop nicely shows 4 julia processes working on different cores 
> >   - there is no communication at the application level stalling anyone 
> > 
> > All I'm left with now is that julia is doing some hidden synchronization 
> > somewhere. 
> > Any input is appreciated. Thanks in advance. 
> > 
> > Kind regards, 
> > Tom 
>
>
>
> -- 
> Erik Schnetter <[email protected] <javascript:>> 
> http://www.perimeterinstitute.ca/personal/eschnetter/ 
>

Reply via email to