I've parallelized some code with @threads, but instead of a factor NCPUs 
speed improvement (for me, 8), I'm seeing rather a bit under a factor 2. I 
suppose the answer may be that my bottleneck isn't computation, rather 
memory access. But during running the code, I see my CPU usage go to 100% 
on all 8 CPUs, if it were memory access would I still see this? Maybe the 
answer is yes, in which case memory access is likely the culprit; is there 
some way to confirm this though? If no, how do I figure out what *is* the 
culprit? 

Here's a stripped down version of my code, 


function test(nl,np)

    inv_cl = ones(3,3,nl)
    d_cl = Dict(i => ones(3,3,nl) for i=1:np)
        
    fish = zeros(np,np)
    ijs = [(i,j) for i=1:np, j=1:np]
    
    Threads.@threads for ij in ijs
        i,j = ij
        for l in 1:nl
            fish[i,j] += (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl
[:,:,l]*d_cl[j][:,:,l])
        end
    end
    
end


# with the @threads
@timeit test(3000,40)
1 loops, best of 3: 3.17 s per loop

# now remove the @threads from above
@timeit test(3000,40)
1 loops, best of 3: 4.42 s per loop



Thanks. 

Reply via email to