My understanding is that an expression like tmp .-= 4z generates two 
temporary arrays. It expands to tmp = tmp .- 4z so one temporary array for 
4z then one for tmp .- 4z (the final tmp rebinds to that).

On Friday, December 18, 2015 at 2:11:48 PM UTC-8, feza wrote:

I think I am misunderstanding the temporary array allocation process. Is it 
> allocating one or two temp arrays? Where have I gone wrong here: 
>
> tmp = 2y (allocates a temporary array to store result)
> tmp .-= 4z (also allocates a temporary array for 4z? Why not just use z 
> directly, thus   tmp[i] = tmp[i] - 4*z[i] )
> tmp ./= w  (Uses previous temp array and w to do the division overwriting 
> tmp,  i.e. loops over tmp[i] = tmp[i]/w[i] )
> x .+= tmp  (performs x[i] = x[i] + tmp[i] )
>
>
>
> On Friday, December 18, 2015 at 1:53:02 PM UTC-5, Steven G. Johnson wrote:
>>
>>
>>
>> On Friday, December 18, 2015 at 1:32:16 PM UTC-5, Ethan Anderes wrote:
>>>
>>> Ok, thanks for the info (and @inbounds does improve it a bit). I 
>>> usually follow your advice and fuse the operations together when I need the 
>>> speed, but since I do all manner of combinations of vectorized operations 
>>> throughout my module I tend to prefer using .*=, ./=, etc unless I need 
>>> it.
>>>
>> Having "all manner of combinations" of these operations is a good reason 
>> *not* to define in-place versions of these operations.  For example, 
>> imagine the computation:
>>
>> x = x + (2y - 4z) ./ w
>>
>>
>> with your proposed in-place assignment operations, I guess this would 
>> become:
>>
>> tmp = 2y
>> tmp .-= 4z
>> tmp ./= w
>> x .+= tmp
>>
>>
>> which still allocates two temporary arrays (one for tmp and one for 4z), 
>> and involves five separate loops.  Compare to:
>>
>> for i in eachindex(x)
>>     x[i] += (2y[i] - 4z[i]) / w[i]
>> end
>>
>>
>> which involves only one loop (and probably better cache performance as a 
>> result) and no temporary arrays.  (You can add @inbounds if you want a bit 
>> more performance and know that w/x/y/z have the same shape.)  Not only is 
>> it more efficient than a sequence of in-place assignments, but I would 
>> argue that it is much more readable as well, despite the need for an 
>> explicit loop.
>>
>> Alternatively, you can use the Devectorize package, and something like
>>
>> @devec x[:] = x + (2y - 4z) ./ w
>>
>>
>> will basically do the same thing as the loop if I understand @devec 
>> correctly.
>>
> ​

Reply via email to