I made some simple changes to your `xpy!`, and managed to get it to 
allocate nothing at all, while performing very close to the speed of `dot`. 
I don't know anything about e.g. `@simd` instructions, but I imagine they 
could help speeding this up even further.

The most significant change was switching `size(A)[1]` to `size(A,1)` (and 
similarly for `B`) - the former has to construct and index into a tuple, 
while the latter won't have to do that. `length(A)` would have worked too.

Notebook, also produced on JuliaBox (running Julia 0.4-rc2): 
http://nbviewer.ipython.org/github/tlycken/IJulia-Notebooks/blob/master/dot%20vs%20xpy%21.ipynb

// T

On Tuesday, October 6, 2015 at 4:28:29 PM UTC+2, Lionel du Peloux wrote:
>
> Dear all,
>
> I'm looking for the fastest way to do element-wise vector multiplication 
> in Julia. The best I could have done is the following implementation which 
> still runs 1.5x slower than the dot product. I assume the dot product would 
> include such an operation ... and then do a cumulative sum over the 
> element-wise product.
>
> The MKL lib includes such an operation (v?Mul) but it seems OpenBLAS does 
> not. So my question is :
>
> 1) is there any chance I can do vector element-wise multiplication faster 
> then the actual dot product ?
> 2) why the built-in element-wise multiplication operator (*.) is much 
> slower than my own implementation for such a basic linealg operation (full 
> julia) ? 
>
> Thank you,
> Lionel
>
> Best custom implementation :
>
> function xpy!{T<:Number}(A::Vector{T},B::Vector{T})
>   n = size(A)[1]
>   if n == size(B)[1]
>     for i=1:n
>       @inbounds A[i] *= B[i]
>     end
>   end
>   return A
> end
>
> Bench mark results (JuliaBox, A = randn(300000) :
>
> function                          CPU (s)     GC (%)  ALLOCATION (bytes)  CPU 
> (x)     
> dot(A,B)                          1.58e-04    0.00    16                  1.0 
>         xpy!(A,B)                         2.31e-04    0.00    80              
>     1.5         
> NumericExtensions.multiply!(P,Q)  3.60e-04    0.00    80                  2.3 
>         xpy!(A,B) - no @inbounds check    4.36e-04    0.00    80              
>     2.8         
> P.*Q                              2.52e-03    50.36   2400512             
> 16.0        
> ############################################################
>
>

Reply via email to