Tomas

In your example, different threads access nearby elements in the array
f. They are likely to be in the same cache line, leading to much
unnecessary inter-CPU communication. This is also called "false
sharing". Leaving a gap between the elements used by each thread might
speed things up.

Your example is also memory-bound, not compute-bound. Most of the time
will be spent accessing the elements of x, not performing a
computation. On a two-socket machine, you can expect at most a speedup
of two.

Finally, function one is quite simple, and there's a good chance it
will be vectorized and/or unrolled. I'm less certain about function
two. If so, you lose between a factor of two or sixteen.

Distributing work over threads has an overhead. It might be better to
use an example that has a non-trivial amount of work per thread, e.g.
sqrt(sqrt(sqrt(...(sqrt(x)))), with ten or hundred sqrt invokations.

-erik


On Thu, Mar 3, 2016 at 2:39 PM,  <[email protected]> wrote:
> Hi All,
> I would like to ask if someone has an experience with Threads as they are
> implemented at the moment in the master branch.
> After the successful compilation  (put JULIA_THREADS=1 to Make.user)
> I have played with different levels of granularity, but usually the code was
> slower or more or less the same speed as single threaded version. I have
> even tried a totally stupid execution like this
>
> using Base.Threads;
> function one()
>   x=randn(1000000);
>   f=0;
>   for i in x
>     f+=i;
>   end
> end
>
> function two()
>   x=randn(1000000);
>   f=zeros(nthreads())
>   @inbounds @threads for i in 1:length(x)
>     f[threadid()]+=x[i];
>   end
>   sum(f)
> end
>
> one()
> @time one()
>
> two()
> @time two()
>
> and the times on my 2013 Macbook air were
>
>   0.068617 seconds (2.00 M allocations: 38.157 MB, 9.72% gc time)
>   0.394164 seconds (5.72 M allocations: 99.015 MB, 5.00% gc time)
>
> Wov, that is quite poor. I would expect an overhead, but not big like this.
>
> Can anyone suggest, what is going wrong? I have been trying a profiler, but
> it does not help. It seems that it does not work with Threads at the moment.
> Or, is it because Threads are still not really supported.
>
> I would like to get speed-up showing in this video
> https://www.youtube.com/watch?v=GvLhseZ4D8M
>
> Any suggestions welcomed.
> Tomas



-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/

Reply via email to