Yes, this most likely won't help for GPU arrays because you likely don't 
want to be looping through elements serially: you want to call a vectorized 
GPU function which will do the computation in parallel on the GPU. 
ArrayFire's mathematical operations are already overloaded to do this, but 
I don't think they can fuse.

On Tuesday, November 1, 2016 at 8:06:12 PM UTC-7, Sheehan Olver wrote:
>
> Ah thanks!
>
> Though I guess if I want the same code to work also on a GPU array then 
> this won't help?
>
> Sent from my iPhone
>
> On 2 Nov. 2016, at 13:51, Chris Rackauckas <rack...@gmail.com 
> <javascript:>> wrote:
>
> It's the other way around. .* won't fuse because it's still an operator. 
> .= will. It you want .* to fuse, you can instead do:
>
> A .= *.(A,B)
>
> since this invokes the broadcast on *, instead of invoking .*. But that's 
> just a temporary thing.
>
> On Tuesday, November 1, 2016 at 7:27:40 PM UTC-7, Tom Breloff wrote:
>>
>> As I understand it, the .* will fuse, but the .= will not (until 0.6?), 
>> so A will be rebound to a newly allocated array.  If my understanding is 
>> wrong I'd love to know.  There have been many times in the last few days 
>> that I would have used it...
>>
>> On Tue, Nov 1, 2016 at 10:06 PM, Sheehan Olver <dlfiv...@gmail.com> 
>> wrote:
>>
>>> Ah, good point.  Though I guess that won't work til 0.6 since .* won't 
>>> auto-fuse yet? 
>>>
>>> Sent from my iPhone
>>>
>>> On 2 Nov. 2016, at 12:55, Chris Rackauckas <rack...@gmail.com> wrote:
>>>
>>> This is pretty much obsolete by the . fusing changes:
>>>
>>> A .= A.*B
>>>
>>> should be an in-place update of A scaled by B (Tomas' solution).
>>>
>>> On Tuesday, November 1, 2016 at 4:39:15 PM UTC-7, Sheehan Olver wrote:
>>>>
>>>> Should this be added to a package?  I imagine if the arrays are on the 
>>>> GPU (AFArrays) then the operation could be much faster, and having a 
>>>> consistent name would be helpful.
>>>>
>>>>
>>>> On Wednesday, October 7, 2015 at 1:28:29 AM UTC+11, Lionel du Peloux 
>>>> wrote:
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I'm looking for the fastest way to do element-wise vector 
>>>>> multiplication in Julia. The best I could have done is the following 
>>>>> implementation which still runs 1.5x slower than the dot product. I 
>>>>> assume 
>>>>> the dot product would include such an operation ... and then do a 
>>>>> cumulative sum over the element-wise product.
>>>>>
>>>>> The MKL lib includes such an operation (v?Mul) but it seems OpenBLAS 
>>>>> does not. So my question is :
>>>>>
>>>>> 1) is there any chance I can do vector element-wise multiplication 
>>>>> faster then the actual dot product ?
>>>>> 2) why the built-in element-wise multiplication operator (*.) is much 
>>>>> slower than my own implementation for such a basic linealg operation 
>>>>> (full 
>>>>> julia) ? 
>>>>>
>>>>> Thank you,
>>>>> Lionel
>>>>>
>>>>> Best custom implementation :
>>>>>
>>>>> function xpy!{T<:Number}(A::Vector{T},B::Vector{T})
>>>>>   n = size(A)[1]
>>>>>   if n == size(B)[1]
>>>>>     for i=1:n
>>>>>       @inbounds A[i] *= B[i]
>>>>>     end
>>>>>   end
>>>>>   return A
>>>>> end
>>>>>
>>>>> Bench mark results (JuliaBox, A = randn(300000) :
>>>>>
>>>>> function                          CPU (s)     GC (%)  ALLOCATION (bytes)  
>>>>> CPU (x)     
>>>>> dot(A,B)                          1.58e-04    0.00    16                  
>>>>> 1.0         xpy!(A,B)                         2.31e-04    0.00    80      
>>>>>             1.5         
>>>>> NumericExtensions.multiply!(P,Q)  3.60e-04    0.00    80                  
>>>>> 2.3         xpy!(A,B) - no @inbounds check    4.36e-04    0.00    80      
>>>>>             2.8         
>>>>> P.*Q                              2.52e-03    50.36   2400512             
>>>>> 16.0        
>>>>> ############################################################
>>>>>
>>>>>
>>

Reply via email to