Just noticed that you're allocating memory on each iteration. If you have the 
patience to write out all those matrix operations explicitly, it should help. 
Alternatively, perhaps try ParallelAccelerator.

Best,
--Tim

On Monday, August 29, 2016 10:49:40 AM CDT Marius Millea wrote:
> Thanks, just tried wrapping the for loop inside a function and it seems to
> make the @threads version slightly slower and serial version slightly
> faster, so I'm even further from the speedup I was hoping for! Reading
> through that Issue and linked ones, I guess I may not be the only one
> seeing this.
> 
> For ref, what I did:
> 
> function myloop(inv_cl,d_cl,fish,ijs,nl)
>     @threads for ij in ijs
>         i,j = ij
>         for l in 1:nl
>             fish[i,j] +=
> (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl[:,:,l]*d_cl[j][:,:,l])
>         end
>     end
> end
> 
> function test(nl,np)
>     inv_cl = ones(3,3,nl)
>     d_cl = Dict(i => ones(3,3,nl) for i=1:np)
> 
>     fish = zeros(np,np)
>     ijs = [(i,j) for i=1:np, j=1:np]
> 
>     myloop(inv_cl,d_cl,fish,ijs,nl)
> end
> 
> # with @threads
> @timeit test(3000,40)
> 1 loops, best of 3: 3.84 s per loop
> 
> # without @threads
> @timeit test(3000,40)
> 1 loops, best of 3: 2.33 s per loop
> 
> On Monday, August 29, 2016 at 6:50:15 PM UTC+2, Tim Holy wrote:
> > Very quickly (train to catch!): try this
> > https://github.com/JuliaLang/julia/
> > 
> > issues/17395#issuecomment-241911387
> > <https://github.com/JuliaLang/julia/issues/17395#issuecomment-241911387>
> > and see if it helps.
> > 
> > --Tim
> > 
> > On Monday, August 29, 2016 9:22:09 AM CDT Marius Millea wrote:
> > > I've parallelized some code with @threads, but instead of a factor NCPUs
> > > speed improvement (for me, 8), I'm seeing rather a bit under a factor 2.
> > 
> > I
> > 
> > > suppose the answer may be that my bottleneck isn't computation, rather
> > > memory access. But during running the code, I see my CPU usage go to
> > 
> > 100%
> > 
> > > on all 8 CPUs, if it were memory access would I still see this? Maybe
> > 
> > the
> > 
> > > answer is yes, in which case memory access is likely the culprit; is
> > 
> > there
> > 
> > > some way to confirm this though? If no, how do I figure out what *is*
> > 
> > the
> > 
> > > culprit?
> > > 
> > > Here's a stripped down version of my code,
> > > 
> > > 
> > > function test(nl,np)
> > > 
> > >     inv_cl = ones(3,3,nl)
> > >     d_cl = Dict(i => ones(3,3,nl) for i=1:np)
> > >     
> > >     fish = zeros(np,np)
> > >     ijs = [(i,j) for i=1:np, j=1:np]
> > >     
> > >     Threads.@threads for ij in ijs
> > >     
> > >         i,j = ij
> > >         for l in 1:nl
> > >         
> > >             fish[i,j] +=
> > 
> > (2*l+1)/2*trace(inv_cl[:,:,l]*d_cl[i][:,:,l]*inv_cl
> > 
> > > [:,:,l]*d_cl[j][:,:,l])
> > > 
> > >         end
> > >     
> > >     end
> > > 
> > > end
> > > 
> > > 
> > > # with the @threads
> > > @timeit test(3000,40)
> > > 1 loops, best of 3: 3.17 s per loop
> > > 
> > > # now remove the @threads from above
> > > @timeit test(3000,40)
> > > 1 loops, best of 3: 4.42 s per loop
> > > 
> > > 
> > > 
> > > Thanks.


Reply via email to