Very quickly (train to catch!): try this https://github.com/JuliaLang/julia/
issues/17395#issuecomment-241911387
and see if it helps.

--Tim

On Monday, August 29, 2016 9:22:09 AM CDT Marius Millea wrote:
> I've parallelized some code with @threads, but instead of a factor NCPUs
> speed improvement (for me, 8), I'm seeing rather a bit under a factor 2. I
> suppose the answer may be that my bottleneck isn't computation, rather
> memory access. But during running the code, I see my CPU usage go to 100%
> on all 8 CPUs, if it were memory access would I still see this? Maybe the
> answer is yes, in which case memory access is likely the culprit; is there
> some way to confirm this though? If no, how do I figure out what *is* the
> culprit?
> 
> Here's a stripped down version of my code,
> 
> 
> function test(nl,np)
> 
>     inv_cl = ones(3,3,nl)
>     d_cl = Dict(i => ones(3,3,nl) for i=1:np)
> 
>     fish = zeros(np,np)
>     ijs = [(i,j) for i=1:np, j=1:np]
> 
>     Threads.@threads for ij in ijs
>         i,j = ij
>         for l in 1:nl
>             fish[i,j] += (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl
> [:,:,l]*d_cl[j][:,:,l])
>         end
>     end
> 
> end
> 
> 
> # with the @threads
> @timeit test(3000,40)
> 1 loops, best of 3: 3.17 s per loop
> 
> # now remove the @threads from above
> @timeit test(3000,40)
> 1 loops, best of 3: 4.42 s per loop
> 
> 
> 
> Thanks.


Reply via email to