A typical cache is much smaller than these matrices. And the L3 cache is probably shared between all cores, so that you need to hold four copies of the data there.
-erik On Fri, Nov 28, 2014 at 5:22 PM, <[email protected]> wrote: > I'd have to investigate this further, but it seems quite odd that there > would be a memory bandwidth problem since the matrices are quite small and > should fit in cache even. > I don't know how julia is using so much memory. 166MB seems really steep. > > On Thursday, November 27, 2014 6:47:34 PM UTC+1, Amit Murthy wrote: >> >> I tried after factoring out the cost of sending A and v over, but no dice. >> >> See https://gist.github.com/amitmurthy/3206a4f61cf6cd6000ee >> >> Even with a loop of 4 within do_stuff, same behavior. >> >> I think I found the reason in an old thread - >> https://groups.google.com/d/msg/julia-users/jlKoEtErRL4/UTN7FSlZDgoJ >> >> To confirm the same, save >> >> ---------------------------- >> const A = randn(1000, 1000); >> const numx = 10_000; >> const v = randn(1000, numx); >> >> do_stuff(r::UnitRange) = ([do_stuff(mm) for mm in r]; nothing) >> function do_stuff(mm::Int64) >> for x in 1:4 >> sum( A * v[:, mm] ) >> end >> end >> >> chunks = Base.splitrange(numx, 4) >> >> do_stuff(chunks[1]); >> ---------------------------- >> >> in a new file p.jl and from the command line first run them in parallel >> like this >> >> julia p.jl & >> >> four times quickly. >> >> The performance is heavily degraded compared to the serial version >> >> >> >> On Thu, Nov 27, 2014 at 10:35 PM, Erik Schnetter <[email protected]> >> wrote: >>> >>> You need to use addprocs before the first @everywhere. I assume you >>> actually did this, since otherwise you'd have received an error. >>> >>> It seems that your variable A and v are stored on the master, not on >>> the workers. Since they are inputs to do_stuff, they need to be copied >>> there every time. Note that the whole array v is copied although only >>> part of it is accessed. Maybe sending data to 4 processes >>> simultaneously has an overhead, and is somehow much slower than >>> sending the data one at a time. >>> >>> To check whether this is true, you can add a loop within do_stuff to >>> execute the routine multiple times. This would increase the workload, >>> but keep the communication overhead the same. >>> >>> If this is the case, then to remedy this, you would store A and v >>> (better: only the necessary part of v) on all processes instead of >>> copying them. >>> >>> -erik >>> >>> >>> On Thu, Nov 27, 2014 at 10:20 AM, <[email protected]> wrote: >>> > Hey everyone, >>> > >>> > I've been looking at parallel programming in julia and was getting some >>> > very >>> > unexpected results and rather bad performance because of this. >>> > Sadly I ran out of ideas of what could be going on, disproving all >>> > ideas I >>> > had. Hence this post :) >>> > >>> > I was able to construct a similar (simpler) example which exhibits the >>> > same >>> > behavior (attached file). >>> > The example is a very naive and suboptimal implementation in many ways >>> > (the >>> > actual code is much more optimal), but that's not the issue. >>> > >>> > The issue I'm trying to investigate is the big difference in worker >>> > time >>> > when a single worker is active and when multiple are active. >>> > >>> > Ideas I disproved: >>> > - julia processes pinned to a single core >>> > - julia process uses multiple threads to do the work, and processes >>> > are >>> > fighting for the cores >>> > - not enough cores on the machine (there are plenty) >>> > - htop nicely shows 4 julia processes working on different cores >>> > - there is no communication at the application level stalling anyone >>> > >>> > All I'm left with now is that julia is doing some hidden >>> > synchronization >>> > somewhere. >>> > Any input is appreciated. Thanks in advance. >>> > >>> > Kind regards, >>> > Tom >>> >>> >>> >>> -- >>> Erik Schnetter <[email protected]> >>> http://www.perimeterinstitute.ca/personal/eschnetter/ >> >> > -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
