now that I think about it, maybe openblas has nothing to do here, since 
@which tanh(y) leads to a call to vectorize_1arg().

If that's the case, wouldn't it be advantageous to have a 
vectorize_1arg_openmp() function (defined in C/C++) that works for 
element-wise operations on scalar arrays,
multi-threading with OpenMP?


El domingo, 18 de mayo de 2014 11:34:11 UTC+2, Carlos Becker escribió:
>
> forgot to add versioninfo():
>
> julia> versioninfo()
> Julia Version 0.3.0-prerelease+2921
> Commit ea70e4d* (2014-05-07 17:56 UTC)
> Platform Info:
>   System: Linux (x86_64-linux-gnu)
>   CPU: Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
>   WORD_SIZE: 64
>   BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
>   LAPACK: libopenblas
>   LIBM: libopenlibm
>
>
> El domingo, 18 de mayo de 2014 11:33:45 UTC+2, Carlos Becker escribió:
>>
>> This is probably related to openblas, but it seems to be that tanh() is 
>> not multi-threaded, which hinders a considerable speed improvement.
>> For example, MATLAB does multi-thread it and gets something around 3x 
>> speed-up over the single-threaded version.
>>
>> For example,
>>
>>   x = rand(100000,200);
>>   @time y = tanh(x);
>>
>> yields:
>>   - 0.71 sec in Julia
>>   - 0.76 sec in matlab with -singleCompThread
>>   - and 0.09 sec in Matlab (this one uses multi-threading by default)
>>
>> Good news is that julia (w/openblas) is competitive with matlab 
>> single-threaded version,
>> though setting the env variable OPENBLAS_NUM_THREADS doesn't have any 
>> effect on the timings, nor I see higher CPU usage with 'top'.
>>
>> Is there an override for OPENBLAS_NUM_THREADS in julia? what am I missing?
>>
>

Reply via email to