Re: [julia-users] Re: tanh() speed / multi-threading

Carlos Becker Sun, 18 May 2014 05:31:41 -0700

btw, the code you just sent works as is with your pull request branch?


------------------------------------------
Carlos


On Sun, May 18, 2014 at 1:04 PM, Carlos Becker <carlosbec...@gmail.com>wrote:

> HI Tobias, I saw your pull request and have been following it closely,
> nice work ;)
>
> Though, in the case of element-wise matrix operations, like tanh, there is
> no need for extra allocations, since the buffer should be allocated only
> once.
>
> From your first code snippet, is julia smart enough to pre-compute i*N/2 ?
> In such cases, creating a kind of array view on the original data would
> probably be faster, right? (though I don't know how allocations work here).
>
> For vectorize_1arg_openmp, I was thinking of "hard-coding" it for known
> operations such as trigonometric ones, that benefit a lot from
> multi-threading.
> I know this is a hack, but it is quick to implement and brings an amazing
> speed up (8x in the case of the code I posted above).
>
>
>
>
> ------------------------------------------
> Carlos
>
>
> On Sun, May 18, 2014 at 12:30 PM, Tobias Knopp <
> tobias.kn...@googlemail.com> wrote:
>
>> Hi Carlos,
>>
>> I am working on something that will allow to do multithreading on Julia
>> functions (https://github.com/JuliaLang/julia/pull/6741). Implementing
>> vectorize_1arg_openmp is actually a lot less trivial as the Julia runtime
>> is not thread safe (yet)
>>
>> Your example is great. I first got a slowdown of 10 because the example
>> revealed a locking issue. With a little trick I now get a speedup of 1.75
>> on a 2 core machine. Not to bad taking into account that memory allocation
>> cannot be parallelized.
>>
>> The tweaked code looks like
>>
>> function tanh_core(x,y,i)
>>
>>     N=length(x)
>>
>>     for l=1:N/2
>>
>>       y[l+i*N/2] = tanh(x[l+i*N/2])
>>
>>     end
>>
>> end
>>
>>
>> function ptanh(x;numthreads=2)
>>
>>     y = similar(x)
>>
>>     N = length(x)
>>
>>     parapply(tanh_core,(x,y), 0:1, numthreads=numthreads)
>>
>>     y
>>
>> end
>>
>>
>> I actually want this to be also fast for
>>
>>
>> function tanh_core(x,y,i)
>>
>>     y[i] = tanh(x[i])
>>
>> end
>>
>>
>> function ptanh(x;numthreads=2)
>>
>>     y = similar(x)
>>
>>     N = length(x)
>>
>>     parapply(tanh_core,(x,y), 1:N, numthreads=numthreads)
>>
>>     y
>>
>> end
>>
>> Am Sonntag, 18. Mai 2014 11:40:13 UTC+2 schrieb Carlos Becker:
>>
>>> now that I think about it, maybe openblas has nothing to do here, since
>>> @which tanh(y) leads to a call to vectorize_1arg().
>>>
>>> If that's the case, wouldn't it be advantageous to have a
>>> vectorize_1arg_openmp() function (defined in C/C++) that works for
>>> element-wise operations on scalar arrays,
>>> multi-threading with OpenMP?
>>>
>>>
>>> El domingo, 18 de mayo de 2014 11:34:11 UTC+2, Carlos Becker escribió:
>>>>
>>>> forgot to add versioninfo():
>>>>
>>>> julia> versioninfo()
>>>> Julia Version 0.3.0-prerelease+2921
>>>> Commit ea70e4d* (2014-05-07 17:56 UTC)
>>>> Platform Info:
>>>>   System: Linux (x86_64-linux-gnu)
>>>>   CPU: Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
>>>>   WORD_SIZE: 64
>>>>   BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
>>>>   LAPACK: libopenblas
>>>>   LIBM: libopenlibm
>>>>
>>>>
>>>> El domingo, 18 de mayo de 2014 11:33:45 UTC+2, Carlos Becker escribió:
>>>>>
>>>>> This is probably related to openblas, but it seems to be that tanh()
>>>>> is not multi-threaded, which hinders a considerable speed improvement.
>>>>> For example, MATLAB does multi-thread it and gets something around 3x
>>>>> speed-up over the single-threaded version.
>>>>>
>>>>> For example,
>>>>>
>>>>>   x = rand(100000,200);
>>>>>   @time y = tanh(x);
>>>>>
>>>>> yields:
>>>>>   - 0.71 sec in Julia
>>>>>   - 0.76 sec in matlab with -singleCompThread
>>>>>   - and 0.09 sec in Matlab (this one uses multi-threading by default)
>>>>>
>>>>> Good news is that julia (w/openblas) is competitive with matlab
>>>>> single-threaded version,
>>>>> though setting the env variable OPENBLAS_NUM_THREADS doesn't have any
>>>>> effect on the timings, nor I see higher CPU usage with 'top'.
>>>>>
>>>>> Is there an override for OPENBLAS_NUM_THREADS in julia? what am I
>>>>> missing?
>>>>>
>>>>
>

Re: [julia-users] Re: tanh() speed / multi-threading

Reply via email to