Thanks, just tried wrapping the for loop inside a function and it seems to
make the @threads version slightly slower and serial version slightly
faster, so I'm even further from the speedup I was hoping for! Reading
through that Issue and linked ones, I guess I may not be the only one
seeing this.
For ref, what I did:
function myloop(inv_cl,d_cl,fish,ijs,nl)
@threads for ij in ijs
i,j = ij
for l in 1:nl
fish[i,j] +=
(2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl[:,:,l]*d_cl[j][:,:,l])
end
end
end
function test(nl,np)
inv_cl = ones(3,3,nl)
d_cl = Dict(i => ones(3,3,nl) for i=1:np)
fish = zeros(np,np)
ijs = [(i,j) for i=1:np, j=1:np]
myloop(inv_cl,d_cl,fish,ijs,nl)
end
# with @threads
@timeit test(3000,40)
1 loops, best of 3: 3.84 s per loop
# without @threads
@timeit test(3000,40)
1 loops, best of 3: 2.33 s per loop
On Monday, August 29, 2016 at 6:50:15 PM UTC+2, Tim Holy wrote:
>
> Very quickly (train to catch!): try this https://github.com/JuliaLang/julia/
>
> issues/17395#issuecomment-241911387
> <https://github.com/JuliaLang/julia/issues/17395#issuecomment-241911387>
> and see if it helps.
>
> --Tim
>
> On Monday, August 29, 2016 9:22:09 AM CDT Marius Millea wrote:
> > I've parallelized some code with @threads, but instead of a factor NCPUs
> > speed improvement (for me, 8), I'm seeing rather a bit under a factor 2.
> I
> > suppose the answer may be that my bottleneck isn't computation, rather
> > memory access. But during running the code, I see my CPU usage go to
> 100%
> > on all 8 CPUs, if it were memory access would I still see this? Maybe
> the
> > answer is yes, in which case memory access is likely the culprit; is
> there
> > some way to confirm this though? If no, how do I figure out what *is*
> the
> > culprit?
> >
> > Here's a stripped down version of my code,
> >
> >
> > function test(nl,np)
> >
> > inv_cl = ones(3,3,nl)
> > d_cl = Dict(i => ones(3,3,nl) for i=1:np)
> >
> > fish = zeros(np,np)
> > ijs = [(i,j) for i=1:np, j=1:np]
> >
> > Threads.@threads for ij in ijs
> > i,j = ij
> > for l in 1:nl
> > fish[i,j] +=
> (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl
> > [:,:,l]*d_cl[j][:,:,l])
> > end
> > end
> >
> > end
> >
> >
> > # with the @threads
> > @timeit test(3000,40)
> > 1 loops, best of 3: 3.17 s per loop
> >
> > # now remove the @threads from above
> > @timeit test(3000,40)
> > 1 loops, best of 3: 4.42 s per loop
> >
> >
> >
> > Thanks.
>
>
>