OK, good to know. I think putting the function in a package is overkill.
> On 2 Nov. 2016, at 6:35 pm, Chris Rackauckas <rackd...@gmail.com> wrote: > > Yes, this most likely won't help for GPU arrays because you likely don't want > to be looping through elements serially: you want to call a vectorized GPU > function which will do the computation in parallel on the GPU. ArrayFire's > mathematical operations are already overloaded to do this, but I don't think > they can fuse. > > On Tuesday, November 1, 2016 at 8:06:12 PM UTC-7, Sheehan Olver wrote: > Ah thanks! > > Though I guess if I want the same code to work also on a GPU array then this > won't help? > > Sent from my iPhone > > On 2 Nov. 2016, at 13:51, Chris Rackauckas <rack...@gmail.com <javascript:>> > wrote: > >> It's the other way around. .* won't fuse because it's still an operator. .= >> will. It you want .* to fuse, you can instead do: >> >> A .= *.(A,B) >> >> since this invokes the broadcast on *, instead of invoking .*. But that's >> just a temporary thing. >> >> On Tuesday, November 1, 2016 at 7:27:40 PM UTC-7, Tom Breloff wrote: >> As I understand it, the .* will fuse, but the .= will not (until 0.6?), so A >> will be rebound to a newly allocated array. If my understanding is wrong >> I'd love to know. There have been many times in the last few days that I >> would have used it... >> >> On Tue, Nov 1, 2016 at 10:06 PM, Sheehan Olver <dlfiv...@gmail.com <>> wrote: >> Ah, good point. Though I guess that won't work til 0.6 since .* won't >> auto-fuse yet? >> >> Sent from my iPhone >> >> On 2 Nov. 2016, at 12:55, Chris Rackauckas <rack...@gmail.com <>> wrote: >> >>> This is pretty much obsolete by the . fusing changes: >>> >>> A .= A.*B >>> >>> should be an in-place update of A scaled by B (Tomas' solution). >>> >>> On Tuesday, November 1, 2016 at 4:39:15 PM UTC-7, Sheehan Olver wrote: >>> Should this be added to a package? I imagine if the arrays are on the GPU >>> (AFArrays) then the operation could be much faster, and having a consistent >>> name would be helpful. >>> >>> >>> On Wednesday, October 7, 2015 at 1:28:29 AM UTC+11, Lionel du Peloux wrote: >>> Dear all, >>> >>> I'm looking for the fastest way to do element-wise vector multiplication in >>> Julia. The best I could have done is the following implementation which >>> still runs 1.5x slower than the dot product. I assume the dot product would >>> include such an operation ... and then do a cumulative sum over the >>> element-wise product. >>> >>> The MKL lib includes such an operation (v?Mul) but it seems OpenBLAS does >>> not. So my question is : >>> >>> 1) is there any chance I can do vector element-wise multiplication faster >>> then the actual dot product ? >>> 2) why the built-in element-wise multiplication operator (*.) is much >>> slower than my own implementation for such a basic linealg operation (full >>> julia) ? >>> >>> Thank you, >>> Lionel >>> >>> Best custom implementation : >>> >>> function xpy!{T<:Number}(A::Vector{T},B::Vector{T}) >>> n = size(A)[1] >>> if n == size(B)[1] >>> for i=1:n >>> @inbounds A[i] *= B[i] >>> end >>> end >>> return A >>> end >>> >>> Bench mark results (JuliaBox, A = randn(300000) : >>> >>> function CPU (s) GC (%) ALLOCATION (bytes) >>> CPU (x) >>> dot(A,B) 1.58e-04 0.00 16 >>> 1.0 >>> xpy!(A,B) 2.31e-04 0.00 80 >>> 1.5 >>> NumericExtensions.multiply!(P,Q) 3.60e-04 0.00 80 >>> 2.3 >>> xpy!(A,B) - no @inbounds check 4.36e-04 0.00 80 >>> 2.8 >>> P.*Q 2.52e-03 50.36 2400512 >>> 16.0 >>> ############################################################ >>