It's not just escape analysis, as this (new) issue demonstrates: https://github.com/JuliaLang/julia/issues/10899
--Tim On Sunday, April 19, 2015 12:33:51 PM Sebastian Good wrote: > Their size seems much decreased. I’d imagine to totally avoid allocation in > this benchmark requires an optimization that really has nothing to do with > subarrays per se. You’d have to do an escape analysis and see that Aj never > left sumcols. Not easy in practice, since it’s passed to slice and length, > and you’d have to make sure they didn’t squirrel it away or pass it on to > someone else. Then you could stack allocate it, or even destructure it into > a bunch of scalar mutations on the stack. After eliminating dead code, > you’d end up with a no-allocation loop much like you’d write by hand. This > sort of optimization seems to be quite tricky for compilers to pull off, > but it’s a common pattern in numerical code. > > In Julia is such cleverness left entirely to LLVM, or are there optimization > passes in Julia itself? On April 19, 2015 at 6:49:21 AM, Tim Holy > ([email protected]) wrote: > > Sorry to be slow to chime in here, but the tuple overhaul has landed and > they are still not zero-cost: > > function sumcols(A) > s = 0.0 > for j = 1:size(A,2) > Aj = slice(A, :, j) > for i = 1:length(Aj) > s += Aj[i] > end > end > s > end > > Even in the latest 0.4, this still allocates memory. On the other hand, > while SubArrays allocate nearly 2x more memory than ArrayViews, the speed > of the two (replacing `slice` with `view` above) is, for me, nearly > identical. > > --Tim > > On Friday, April 17, 2015 08:30:27 PM Sebastian Good wrote: > > This was discussed a few weeks ago > > > > https://groups.google.com/d/msg/julia-users/IxrvV8ABZoQ/uWZu5-IB3McJ > > > > I think the bottom line is that the current implementation *should* be > > 'zero-cost' once a set of planned improvements and optimizations take > > place. One of the key ones is a tuple overhaul. > > > > Fair to say it can never be 'zero' cost since there is different inherent > > overhead depending on the type of subarray, e.g. offset, slice, > > re-dimension, etc. however the implementation is quite clever about > > allowing specialization of those. > > > > In a common case (e.g. a constant offset or simple stride) my > > understanding > > is that the structure will be type-specialized and likely stack allocated > > in many cases, reducing to what you'd write by hand. At least this is what > > they're after. > > > > On Friday, April 17, 2015 at 4:24:14 PM UTC-4, Peter Brady wrote: > > > Thanks for the links. I'll check out ArrayViews as it looks like what I > > > was going to do manually without wrapping it in a type. > > > > > > By semi-dim agnostic I meant that the differencing algorithm itself only > > > cares about one dimension but that dimension is different for different > > > directions. Only a few toplevel routines actually need to know about the > > > dimensionality of the problem. > > > > > > On Friday, April 17, 2015 at 2:04:39 PM UTC-6, René Donner wrote: > > >> As far as I have measured it sub in 0.4 is still not cheap, as it > > >> provides the flexibility to deal with all kinds of strides and offsets, > > >> and > > >> the view object itself thus has a certain size. See > > >> https://github.com/rened/FunctionalData.jl#efficiency for a simple > > >> analysis, where the speed is mostly dominated by the speed of the > > >> "sub-view" mechanism. > > >> > > >> To get faster views which require strides etc you can try > > >> https://github.com/JuliaLang/ArrayViews.jl > > >> > > >> What do you mean by semi-dim agnostic? In case you only need indexing > > >> along the last dimension (like a[:,:,i] and a[:,:,:,i]) you can use > > >> > > >> https://github.com/rened/FunctionalData.jl#efficient-views-details > > >> > > >> which uses normal DenseArrays and simple pointer updates internally. It > > >> can also update a view in-place, by just incrementing the pointer. > > >> > > >> Am 17.04.2015 um 21:48 schrieb Peter Brady <[email protected]>: > > >> > Inorder to write some differencing algorithms in a semi-dimensional > > >> > > >> agnostic manner the code I've written makes heavy use of subarrays > > >> which > > >> turn out to be rather costly. I've noticed some posts on the cost of > > >> subarrays here and that things will be better in 0.4. Can someone > > >> comment > > >> on how much better? Would subarray (or anything like it) be on par with > > >> simply passing an offset and stride (constant) and computing the index > > >> myself? I'm currently using the 0.3 release branch.
