Most likely. I would also time it with and without @simd at your problem size. For some reason I've had some simple loops do better without @simd.
On Monday, June 20, 2016 at 2:50:22 PM UTC+1, [email protected] wrote: > > Thanks! I'm still using v0.4.5. In this case, is the code I highlighted > above still the best choice for doing the job? > > > On Monday, June 20, 2016 at 1:57:25 PM UTC+1, Chris Rackauckas wrote: >> >> I think that for medium size (but not large) arrays in v0.5 you may want >> to use @threads from the threadding branch, and then for really large >> arrays you may want to use @parallel. But you'd have to test some timings. >> >> On Monday, June 20, 2016 at 11:38:15 AM UTC+1, [email protected] wrote: >>> >>> I have the same question regarding how to calculate the entry-wise >>> vector product and find this thread. As a novice, I wonder if the following >>> code snippet is still the standard for entry-wise vector multiplication >>> that one should stick to in practice? Thanks! >>> >>> >>> @fastmath @inbounds @simd for i=1:n >>> A[i] *= B[i] >>> end >>> >>> >>> >>> On Tuesday, October 6, 2015 at 3:28:29 PM UTC+1, Lionel du Peloux wrote: >>>> >>>> Dear all, >>>> >>>> I'm looking for the fastest way to do element-wise vector >>>> multiplication in Julia. The best I could have done is the following >>>> implementation which still runs 1.5x slower than the dot product. I assume >>>> the dot product would include such an operation ... and then do a >>>> cumulative sum over the element-wise product. >>>> >>>> The MKL lib includes such an operation (v?Mul) but it seems OpenBLAS >>>> does not. So my question is : >>>> >>>> 1) is there any chance I can do vector element-wise multiplication >>>> faster then the actual dot product ? >>>> 2) why the built-in element-wise multiplication operator (*.) is much >>>> slower than my own implementation for such a basic linealg operation (full >>>> julia) ? >>>> >>>> Thank you, >>>> Lionel >>>> >>>> Best custom implementation : >>>> >>>> function xpy!{T<:Number}(A::Vector{T},B::Vector{T}) >>>> n = size(A)[1] >>>> if n == size(B)[1] >>>> for i=1:n >>>> @inbounds A[i] *= B[i] >>>> end >>>> end >>>> return A >>>> end >>>> >>>> Bench mark results (JuliaBox, A = randn(300000) : >>>> >>>> function CPU (s) GC (%) ALLOCATION (bytes) >>>> CPU (x) >>>> dot(A,B) 1.58e-04 0.00 16 >>>> 1.0 xpy!(A,B) 2.31e-04 0.00 80 >>>> 1.5 >>>> NumericExtensions.multiply!(P,Q) 3.60e-04 0.00 80 >>>> 2.3 xpy!(A,B) - no @inbounds check 4.36e-04 0.00 80 >>>> 2.8 >>>> P.*Q 2.52e-03 50.36 2400512 >>>> 16.0 >>>> ############################################################ >>>> >>>>
