My benchmark shows that element indexing has been as fast as it can be for array views (or subarrays in Julia 0.4).
Now the problem is actually the construction of views/subarrays. To optimize the overhead of this part, the compiler may need to introduce additional optimization. Dahua On Monday, April 20, 2015 at 6:39:35 AM UTC+8, Sebastian Good wrote: > > —track-allocation still requires guesswork, as optimizations can move the > allocation to a different place than you would expect. > > On April 19, 2015 at 4:36:19 PM, Peter Brady ([email protected] > <javascript:>) wrote: > > So I discovered the --track-allocation option and now I am really > confused: > > Here's my session: > > $ julia --track-allocation=all > _ > _ _ _(_)_ | A fresh approach to technical computing > (_) | (_) (_) | Documentation: http://docs.julialang.org > _ _ _| |_ __ _ | Type "help()" for help. > | | | | | | |/ _` | | > | | |_| | | | (_| | | Version 0.3.8-pre+13 (2015-04-17 18:08 UTC) > _/ |\__'_|_|_|\__'_| | Commit 0df962d* (2 days old release-0.3) > |__/ | x86_64-redhat-linux > > julia> include("test.jl") > test_all (generic function with 1 method) > > julia> test_unsafe(5) > > And here's the relevant part of the resulting test.jl.mem file. Note that > I commented out some calls to 'size' and replaced with the appropriate > hard-coded values but the resulting allocation is the same... Can anyone > shed some light on this while I wait for 0.4 to compile? > > - function update(a::AbstractArray, idx, off) > 8151120 for i=1:320 #size(a, idx) > 0 a[i] = -10*off+i > - end > 0 a > - end > - > - function setk_UnSafe{T}(a::Array{T,3}) > 760 us = UnsafeSlice(a, 3) > 0 for j=1:size(a,2),i=1:size(a,1) > 8151120 us.start = (j-1)*320+i #size(a,1)+i > - #off = sub2ind(size(a), i, j, 1) > 0 update(us, 3, us.start) > - end > 0 a > - end > - function test_unsafe(n) > 0 a = zeros(Int, (320, 320, 320)) > - # warmup > 0 setk_UnSafe(a); > 0 clear_malloc_data() > - #@time ( > 0 for i=1:n; setk_UnSafe(a); end > - end > > > On Sunday, April 19, 2015 at 2:21:56 PM UTC-6, Peter Brady wrote: >> >> @Dahua, thanks for adding an unsafeview! I appreciate how quickly this >> community responds. >> >> I've added the following function to my test.jl script >> function setk_unsafeview{T}(a::Array{T,3}) >> for j=1:size(a,2),i=1:size(a,1) >> off = sub2ind(size(a), i, j, 1) >> update(unsafe_view(a, i, j, :), 3, off) >> end >> a >> end >> But I'm not seeing the large increase in performance I was expecting. >> My timings are now >> >> julia> test_all(5); >> test_stride >> elapsed time: 2.156173128 seconds (0 bytes allocated) >> test_view >> elapsed time: 9.30964534 seconds (94208000 bytes allocated, 0.47% gc time) >> test_unsafe >> elapsed time: 2.169307471 seconds (16303000 bytes allocated) >> test_unsafeview >> elapsed time: 8.955876793 seconds (90112000 bytes allocated, 0.41% gc >> time) >> >> To be fair, I am cheating a bit with my custom 'UnsafeSlice' since I make >> only one instance and simply update the offset on each iteration. If I >> make it immutable and create a new instance on every iteration (as I do for >> the view and unsafeview), things slow down a little and the allocation goes >> south: >> >> julia> test_all(5); >> test_stride >> elapsed time: 2.159909265 seconds (0 bytes allocated) >> test_view >> elapsed time: 9.029025282 seconds (94208000 bytes allocated, 0.43% gc >> time) >> test_unsafe >> elapsed time: 2.621667854 seconds (114606240 bytes allocated, 2.41% gc >> time) >> test_unsafeview >> elapsed time: 8.888434466 seconds (90112000 bytes allocated, 0.44% gc >> time) >> >> These are all with 0.3.8-pre. I'll try compiling master and see what >> happens. I'm still confused about why allocating a single type with a >> pointer, 2 ints and a tuple costs so much memory though. >> >> >> >> On Sunday, April 19, 2015 at 11:38:17 AM UTC-6, Tim Holy wrote: >>> >>> It's not just escape analysis, as this (new) issue demonstrates: >>> https://github.com/JuliaLang/julia/issues/10899 >>> >>> --Tim >>> >>> On Sunday, April 19, 2015 12:33:51 PM Sebastian Good wrote: >>> > Their size seems much decreased. I’d imagine to totally avoid >>> allocation in >>> > this benchmark requires an optimization that really has nothing to do >>> with >>> > subarrays per se. You’d have to do an escape analysis and see that Aj >>> never >>> > left sumcols. Not easy in practice, since it’s passed to slice and >>> length, >>> > and you’d have to make sure they didn’t squirrel it away or pass it on >>> to >>> > someone else. Then you could stack allocate it, or even destructure it >>> into >>> > a bunch of scalar mutations on the stack. After eliminating dead code, >>> > you’d end up with a no-allocation loop much like you’d write by hand. >>> This >>> > sort of optimization seems to be quite tricky for compilers to pull >>> off, >>> > but it’s a common pattern in numerical code. >>> > >>> > In Julia is such cleverness left entirely to LLVM, or are there >>> optimization >>> > passes in Julia itself? On April 19, 2015 at 6:49:21 AM, Tim Holy >>> > ([email protected]) wrote: >>> > >>> > Sorry to be slow to chime in here, but the tuple overhaul has landed >>> and >>> > they are still not zero-cost: >>> > >>> > function sumcols(A) >>> > s = 0.0 >>> > for j = 1:size(A,2) >>> > Aj = slice(A, :, j) >>> > for i = 1:length(Aj) >>> > s += Aj[i] >>> > end >>> > end >>> > s >>> > end >>> > >>> > Even in the latest 0.4, this still allocates memory. On the other hand, >>> > while SubArrays allocate nearly 2x more memory than ArrayViews, the >>> speed >>> > of the two (replacing `slice` with `view` above) is, for me, nearly >>> > identical. >>> > >>> > --Tim >>> > >>> > On Friday, April 17, 2015 08:30:27 PM Sebastian Good wrote: >>> > > This was discussed a few weeks ago >>> > > >>> > > https://groups.google.com/d/msg/julia-users/IxrvV8ABZoQ/uWZu5-IB3McJ >>> > > >>> > > I think the bottom line is that the current implementation *should* >>> be >>> > > 'zero-cost' once a set of planned improvements and optimizations take >>> > > place. One of the key ones is a tuple overhaul. >>> > > >>> > > Fair to say it can never be 'zero' cost since there is different >>> inherent >>> > > overhead depending on the type of subarray, e.g. offset, slice, >>> > > re-dimension, etc. however the implementation is quite clever about >>> > > allowing specialization of those. >>> > > >>> > > In a common case (e.g. a constant offset or simple stride) my >>> > > understanding >>> > > is that the structure will be type-specialized and likely stack >>> allocated >>> > > in many cases, reducing to what you'd write by hand. At least this >>> is what >>> > > they're after. >>> > > >>> > > On Friday, April 17, 2015 at 4:24:14 PM UTC-4, Peter Brady wrote: >>> > > > Thanks for the links. I'll check out ArrayViews as it looks like >>> what I >>> > > > was going to do manually without wrapping it in a type. >>> > > > >>> > > > By semi-dim agnostic I meant that the differencing algorithm >>> itself only >>> > > > cares about one dimension but that dimension is different for >>> different >>> > > > directions. Only a few toplevel routines actually need to know >>> about the >>> > > > dimensionality of the problem. >>> > > > >>> > > > On Friday, April 17, 2015 at 2:04:39 PM UTC-6, René Donner wrote: >>> > > >> As far as I have measured it sub in 0.4 is still not cheap, as it >>> > > >> provides the flexibility to deal with all kinds of strides and >>> offsets, >>> > > >> and >>> > > >> the view object itself thus has a certain size. See >>> > > >> https://github.com/rened/FunctionalData.jl#efficiency for a >>> simple >>> > > >> analysis, where the speed is mostly dominated by the speed of the >>> > > >> "sub-view" mechanism. >>> > > >> >>> > > >> To get faster views which require strides etc you can try >>> > > >> https://github.com/JuliaLang/ArrayViews.jl >>> > > >> >>> > > >> What do you mean by semi-dim agnostic? In case you only need >>> indexing >>> > > >> along the last dimension (like a[:,:,i] and a[:,:,:,i]) you can >>> use >>> > > >> >>> > > >> >>> https://github.com/rened/FunctionalData.jl#efficient-views-details >>> > > >> >>> > > >> which uses normal DenseArrays and simple pointer updates >>> internally. It >>> > > >> can also update a view in-place, by just incrementing the pointer. >>> > > >> >>> > > >> Am 17.04.2015 um 21:48 schrieb Peter Brady <[email protected]>: >>> > > >> > Inorder to write some differencing algorithms in a >>> semi-dimensional >>> > > >> >>> > > >> agnostic manner the code I've written makes heavy use of subarrays >>> > > >> which >>> > > >> turn out to be rather costly. I've noticed some posts on the cost >>> of >>> > > >> subarrays here and that things will be better in 0.4. Can someone >>> > > >> comment >>> > > >> on how much better? Would subarray (or anything like it) be on >>> par with >>> > > >> simply passing an offset and stride (constant) and computing the >>> index >>> > > >> myself? I'm currently using the 0.3 release branch. >>> >>>
