On Friday, December 18, 2015 at 1:32:16 PM UTC-5, Ethan Anderes wrote:
>
> Ok, thanks for the info (and @inbounds does improve it a bit). I usually
> follow your advice and fuse the operations together when I need the speed,
> but since I do all manner of combinations of vectorized operations
> throughout my module I tend to prefer using .*=, ./=, etc unless I need
> it.
>
Having "all manner of combinations" of these operations is a good reason
*not* to define in-place versions of these operations. For example,
imagine the computation:
x = x + (2y - 4z) ./ w
with your proposed in-place assignment operations, I guess this would
become:
tmp = 2y
tmp .-= 4z
tmp ./= w
x .+= tmp
which still allocates two temporary arrays (one for tmp and one for 4z),
and involves five separate loops. Compare to:
for i in eachindex(x)
x[i] += (2y[i] - 4z[i]) / w[i]
end
which involves only one loop (and probably better cache performance as a
result) and no temporary arrays. (You can add @inbounds if you want a bit
more performance and know that w/x/y/z have the same shape.) Not only is
it more efficient than a sequence of in-place assignments, but I would
argue that it is much more readable as well, despite the need for an
explicit loop.
Alternatively, you can use the Devectorize package, and something like
@devec x[:] = x + (2y - 4z) ./ w
will basically do the same thing as the loop if I understand @devec
correctly.