Just noticed that you're allocating memory on each iteration. If you have the patience to write out all those matrix operations explicitly, it should help. Alternatively, perhaps try ParallelAccelerator.
Best, --Tim On Monday, August 29, 2016 10:49:40 AM CDT Marius Millea wrote: > Thanks, just tried wrapping the for loop inside a function and it seems to > make the @threads version slightly slower and serial version slightly > faster, so I'm even further from the speedup I was hoping for! Reading > through that Issue and linked ones, I guess I may not be the only one > seeing this. > > For ref, what I did: > > function myloop(inv_cl,d_cl,fish,ijs,nl) > @threads for ij in ijs > i,j = ij > for l in 1:nl > fish[i,j] += > (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl[:,:,l]*d_cl[j][:,:,l]) > end > end > end > > function test(nl,np) > inv_cl = ones(3,3,nl) > d_cl = Dict(i => ones(3,3,nl) for i=1:np) > > fish = zeros(np,np) > ijs = [(i,j) for i=1:np, j=1:np] > > myloop(inv_cl,d_cl,fish,ijs,nl) > end > > # with @threads > @timeit test(3000,40) > 1 loops, best of 3: 3.84 s per loop > > # without @threads > @timeit test(3000,40) > 1 loops, best of 3: 2.33 s per loop > > On Monday, August 29, 2016 at 6:50:15 PM UTC+2, Tim Holy wrote: > > Very quickly (train to catch!): try this > > https://github.com/JuliaLang/julia/ > > > > issues/17395#issuecomment-241911387 > > <https://github.com/JuliaLang/julia/issues/17395#issuecomment-241911387> > > and see if it helps. > > > > --Tim > > > > On Monday, August 29, 2016 9:22:09 AM CDT Marius Millea wrote: > > > I've parallelized some code with @threads, but instead of a factor NCPUs > > > speed improvement (for me, 8), I'm seeing rather a bit under a factor 2. > > > > I > > > > > suppose the answer may be that my bottleneck isn't computation, rather > > > memory access. But during running the code, I see my CPU usage go to > > > > 100% > > > > > on all 8 CPUs, if it were memory access would I still see this? Maybe > > > > the > > > > > answer is yes, in which case memory access is likely the culprit; is > > > > there > > > > > some way to confirm this though? If no, how do I figure out what *is* > > > > the > > > > > culprit? > > > > > > Here's a stripped down version of my code, > > > > > > > > > function test(nl,np) > > > > > > inv_cl = ones(3,3,nl) > > > d_cl = Dict(i => ones(3,3,nl) for i=1:np) > > > > > > fish = zeros(np,np) > > > ijs = [(i,j) for i=1:np, j=1:np] > > > > > > Threads.@threads for ij in ijs > > > > > > i,j = ij > > > for l in 1:nl > > > > > > fish[i,j] += > > > > (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl > > > > > [:,:,l]*d_cl[j][:,:,l]) > > > > > > end > > > > > > end > > > > > > end > > > > > > > > > # with the @threads > > > @timeit test(3000,40) > > > 1 loops, best of 3: 3.17 s per loop > > > > > > # now remove the @threads from above > > > @timeit test(3000,40) > > > 1 loops, best of 3: 4.42 s per loop > > > > > > > > > > > > Thanks.
