My understanding is that an expression like tmp .-= 4z generates two temporary arrays. It expands to tmp = tmp .- 4z so one temporary array for 4z then one for tmp .- 4z (the final tmp rebinds to that).
On Friday, December 18, 2015 at 2:11:48 PM UTC-8, feza wrote: I think I am misunderstanding the temporary array allocation process. Is it > allocating one or two temp arrays? Where have I gone wrong here: > > tmp = 2y (allocates a temporary array to store result) > tmp .-= 4z (also allocates a temporary array for 4z? Why not just use z > directly, thus tmp[i] = tmp[i] - 4*z[i] ) > tmp ./= w (Uses previous temp array and w to do the division overwriting > tmp, i.e. loops over tmp[i] = tmp[i]/w[i] ) > x .+= tmp (performs x[i] = x[i] + tmp[i] ) > > > > On Friday, December 18, 2015 at 1:53:02 PM UTC-5, Steven G. Johnson wrote: >> >> >> >> On Friday, December 18, 2015 at 1:32:16 PM UTC-5, Ethan Anderes wrote: >>> >>> Ok, thanks for the info (and @inbounds does improve it a bit). I >>> usually follow your advice and fuse the operations together when I need the >>> speed, but since I do all manner of combinations of vectorized operations >>> throughout my module I tend to prefer using .*=, ./=, etc unless I need >>> it. >>> >> Having "all manner of combinations" of these operations is a good reason >> *not* to define in-place versions of these operations. For example, >> imagine the computation: >> >> x = x + (2y - 4z) ./ w >> >> >> with your proposed in-place assignment operations, I guess this would >> become: >> >> tmp = 2y >> tmp .-= 4z >> tmp ./= w >> x .+= tmp >> >> >> which still allocates two temporary arrays (one for tmp and one for 4z), >> and involves five separate loops. Compare to: >> >> for i in eachindex(x) >> x[i] += (2y[i] - 4z[i]) / w[i] >> end >> >> >> which involves only one loop (and probably better cache performance as a >> result) and no temporary arrays. (You can add @inbounds if you want a bit >> more performance and know that w/x/y/z have the same shape.) Not only is >> it more efficient than a sequence of in-place assignments, but I would >> argue that it is much more readable as well, despite the need for an >> explicit loop. >> >> Alternatively, you can use the Devectorize package, and something like >> >> @devec x[:] = x + (2y - 4z) ./ w >> >> >> will basically do the same thing as the loop if I understand @devec >> correctly. >> >
