Very quickly (train to catch!): try this https://github.com/JuliaLang/julia/ issues/17395#issuecomment-241911387 and see if it helps.
--Tim On Monday, August 29, 2016 9:22:09 AM CDT Marius Millea wrote: > I've parallelized some code with @threads, but instead of a factor NCPUs > speed improvement (for me, 8), I'm seeing rather a bit under a factor 2. I > suppose the answer may be that my bottleneck isn't computation, rather > memory access. But during running the code, I see my CPU usage go to 100% > on all 8 CPUs, if it were memory access would I still see this? Maybe the > answer is yes, in which case memory access is likely the culprit; is there > some way to confirm this though? If no, how do I figure out what *is* the > culprit? > > Here's a stripped down version of my code, > > > function test(nl,np) > > inv_cl = ones(3,3,nl) > d_cl = Dict(i => ones(3,3,nl) for i=1:np) > > fish = zeros(np,np) > ijs = [(i,j) for i=1:np, j=1:np] > > Threads.@threads for ij in ijs > i,j = ij > for l in 1:nl > fish[i,j] += (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl > [:,:,l]*d_cl[j][:,:,l]) > end > end > > end > > > # with the @threads > @timeit test(3000,40) > 1 loops, best of 3: 3.17 s per loop > > # now remove the @threads from above > @timeit test(3000,40) > 1 loops, best of 3: 4.42 s per loop > > > > Thanks.