btw, the code you just sent works as is with your pull request branch?
------------------------------------------ Carlos On Sun, May 18, 2014 at 1:04 PM, Carlos Becker <carlosbec...@gmail.com>wrote: > HI Tobias, I saw your pull request and have been following it closely, > nice work ;) > > Though, in the case of element-wise matrix operations, like tanh, there is > no need for extra allocations, since the buffer should be allocated only > once. > > From your first code snippet, is julia smart enough to pre-compute i*N/2 ? > In such cases, creating a kind of array view on the original data would > probably be faster, right? (though I don't know how allocations work here). > > For vectorize_1arg_openmp, I was thinking of "hard-coding" it for known > operations such as trigonometric ones, that benefit a lot from > multi-threading. > I know this is a hack, but it is quick to implement and brings an amazing > speed up (8x in the case of the code I posted above). > > > > > ------------------------------------------ > Carlos > > > On Sun, May 18, 2014 at 12:30 PM, Tobias Knopp < > tobias.kn...@googlemail.com> wrote: > >> Hi Carlos, >> >> I am working on something that will allow to do multithreading on Julia >> functions (https://github.com/JuliaLang/julia/pull/6741). Implementing >> vectorize_1arg_openmp is actually a lot less trivial as the Julia runtime >> is not thread safe (yet) >> >> Your example is great. I first got a slowdown of 10 because the example >> revealed a locking issue. With a little trick I now get a speedup of 1.75 >> on a 2 core machine. Not to bad taking into account that memory allocation >> cannot be parallelized. >> >> The tweaked code looks like >> >> function tanh_core(x,y,i) >> >> N=length(x) >> >> for l=1:N/2 >> >> y[l+i*N/2] = tanh(x[l+i*N/2]) >> >> end >> >> end >> >> >> function ptanh(x;numthreads=2) >> >> y = similar(x) >> >> N = length(x) >> >> parapply(tanh_core,(x,y), 0:1, numthreads=numthreads) >> >> y >> >> end >> >> >> I actually want this to be also fast for >> >> >> function tanh_core(x,y,i) >> >> y[i] = tanh(x[i]) >> >> end >> >> >> function ptanh(x;numthreads=2) >> >> y = similar(x) >> >> N = length(x) >> >> parapply(tanh_core,(x,y), 1:N, numthreads=numthreads) >> >> y >> >> end >> >> Am Sonntag, 18. Mai 2014 11:40:13 UTC+2 schrieb Carlos Becker: >> >>> now that I think about it, maybe openblas has nothing to do here, since >>> @which tanh(y) leads to a call to vectorize_1arg(). >>> >>> If that's the case, wouldn't it be advantageous to have a >>> vectorize_1arg_openmp() function (defined in C/C++) that works for >>> element-wise operations on scalar arrays, >>> multi-threading with OpenMP? >>> >>> >>> El domingo, 18 de mayo de 2014 11:34:11 UTC+2, Carlos Becker escribió: >>>> >>>> forgot to add versioninfo(): >>>> >>>> julia> versioninfo() >>>> Julia Version 0.3.0-prerelease+2921 >>>> Commit ea70e4d* (2014-05-07 17:56 UTC) >>>> Platform Info: >>>> System: Linux (x86_64-linux-gnu) >>>> CPU: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz >>>> WORD_SIZE: 64 >>>> BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY) >>>> LAPACK: libopenblas >>>> LIBM: libopenlibm >>>> >>>> >>>> El domingo, 18 de mayo de 2014 11:33:45 UTC+2, Carlos Becker escribió: >>>>> >>>>> This is probably related to openblas, but it seems to be that tanh() >>>>> is not multi-threaded, which hinders a considerable speed improvement. >>>>> For example, MATLAB does multi-thread it and gets something around 3x >>>>> speed-up over the single-threaded version. >>>>> >>>>> For example, >>>>> >>>>> x = rand(100000,200); >>>>> @time y = tanh(x); >>>>> >>>>> yields: >>>>> - 0.71 sec in Julia >>>>> - 0.76 sec in matlab with -singleCompThread >>>>> - and 0.09 sec in Matlab (this one uses multi-threading by default) >>>>> >>>>> Good news is that julia (w/openblas) is competitive with matlab >>>>> single-threaded version, >>>>> though setting the env variable OPENBLAS_NUM_THREADS doesn't have any >>>>> effect on the timings, nor I see higher CPU usage with 'top'. >>>>> >>>>> Is there an override for OPENBLAS_NUM_THREADS in julia? what am I >>>>> missing? >>>>> >>>> >