Tomas In your example, different threads access nearby elements in the array f. They are likely to be in the same cache line, leading to much unnecessary inter-CPU communication. This is also called "false sharing". Leaving a gap between the elements used by each thread might speed things up.
Your example is also memory-bound, not compute-bound. Most of the time will be spent accessing the elements of x, not performing a computation. On a two-socket machine, you can expect at most a speedup of two. Finally, function one is quite simple, and there's a good chance it will be vectorized and/or unrolled. I'm less certain about function two. If so, you lose between a factor of two or sixteen. Distributing work over threads has an overhead. It might be better to use an example that has a non-trivial amount of work per thread, e.g. sqrt(sqrt(sqrt(...(sqrt(x)))), with ten or hundred sqrt invokations. -erik On Thu, Mar 3, 2016 at 2:39 PM, <[email protected]> wrote: > Hi All, > I would like to ask if someone has an experience with Threads as they are > implemented at the moment in the master branch. > After the successful compilation (put JULIA_THREADS=1 to Make.user) > I have played with different levels of granularity, but usually the code was > slower or more or less the same speed as single threaded version. I have > even tried a totally stupid execution like this > > using Base.Threads; > function one() > x=randn(1000000); > f=0; > for i in x > f+=i; > end > end > > function two() > x=randn(1000000); > f=zeros(nthreads()) > @inbounds @threads for i in 1:length(x) > f[threadid()]+=x[i]; > end > sum(f) > end > > one() > @time one() > > two() > @time two() > > and the times on my 2013 Macbook air were > > 0.068617 seconds (2.00 M allocations: 38.157 MB, 9.72% gc time) > 0.394164 seconds (5.72 M allocations: 99.015 MB, 5.00% gc time) > > Wov, that is quite poor. I would expect an overhead, but not big like this. > > Can anyone suggest, what is going wrong? I have been trying a profiler, but > it does not help. It seems that it does not work with Threads at the moment. > Or, is it because Threads are still not really supported. > > I would like to get speed-up showing in this video > https://www.youtube.com/watch?v=GvLhseZ4D8M > > Any suggestions welcomed. > Tomas -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
