@Dahua, thanks for adding an unsafeview!  I appreciate how quickly this 
community responds.

I've added the following function to my test.jl script
function setk_unsafeview{T}(a::Array{T,3})
    for j=1:size(a,2),i=1:size(a,1)
        off = sub2ind(size(a), i, j, 1)
        update(unsafe_view(a, i, j, :), 3, off)
    end
    a
end
 But I'm not seeing the large increase in performance I was expecting.  My 
timings are now

julia> test_all(5);
test_stride
elapsed time: 2.156173128 seconds (0 bytes allocated)
test_view
elapsed time: 9.30964534 seconds (94208000 bytes allocated, 0.47% gc time)
test_unsafe
elapsed time: 2.169307471 seconds (16303000 bytes allocated)
test_unsafeview
elapsed time: 8.955876793 seconds (90112000 bytes allocated, 0.41% gc time)

To be fair, I am cheating a bit with my custom 'UnsafeSlice' since I make 
only one instance and simply update the offset on each iteration.  If I 
make it immutable and create a new instance on every iteration (as I do for 
the view and unsafeview), things slow down a little and the allocation goes 
south:

julia> test_all(5);
test_stride
elapsed time: 2.159909265 seconds (0 bytes allocated)
test_view
elapsed time: 9.029025282 seconds (94208000 bytes allocated, 0.43% gc time)
test_unsafe
elapsed time: 2.621667854 seconds (114606240 bytes allocated, 2.41% gc time)
test_unsafeview
elapsed time: 8.888434466 seconds (90112000 bytes allocated, 0.44% gc time)

These are all with 0.3.8-pre.  I'll try compiling master and see what 
happens.  I'm still confused about why allocating a single type with a 
pointer, 2 ints and a tuple costs so much memory though.



On Sunday, April 19, 2015 at 11:38:17 AM UTC-6, Tim Holy wrote:
>
> It's not just escape analysis, as this (new) issue demonstrates: 
> https://github.com/JuliaLang/julia/issues/10899 
>
> --Tim 
>
> On Sunday, April 19, 2015 12:33:51 PM Sebastian Good wrote: 
> > Their size seems much decreased. I’d imagine to totally avoid allocation 
> in 
> > this benchmark requires an optimization that really has nothing to do 
> with 
> > subarrays per se. You’d have to do an escape analysis and see that Aj 
> never 
> > left sumcols. Not easy in practice, since it’s passed to slice and 
> length, 
> > and you’d have to make sure they didn’t squirrel it away or pass it on 
> to 
> > someone else. Then you could stack allocate it, or even destructure it 
> into 
> > a bunch of scalar mutations on the stack. After eliminating dead code, 
> > you’d end up with a no-allocation loop much like you’d write by hand. 
> This 
> > sort of optimization seems to be quite tricky for compilers to pull off, 
> > but it’s a common pattern in numerical code. 
> > 
> > In Julia is such cleverness left entirely to LLVM, or are there 
> optimization 
> > passes in Julia itself? On April 19, 2015 at 6:49:21 AM, Tim Holy 
> > ([email protected] <javascript:>) wrote: 
> > 
> > Sorry to be slow to chime in here, but the tuple overhaul has landed and 
> > they are still not zero-cost: 
> > 
> > function sumcols(A) 
> > s = 0.0 
> > for j = 1:size(A,2) 
> > Aj = slice(A, :, j) 
> > for i = 1:length(Aj) 
> > s += Aj[i] 
> > end 
> > end 
> > s 
> > end 
> > 
> > Even in the latest 0.4, this still allocates memory. On the other hand, 
> > while SubArrays allocate nearly 2x more memory than ArrayViews, the 
> speed 
> > of the two (replacing `slice` with `view` above) is, for me, nearly 
> > identical. 
> > 
> > --Tim 
> > 
> > On Friday, April 17, 2015 08:30:27 PM Sebastian Good wrote: 
> > > This was discussed a few weeks ago 
> > > 
> > > https://groups.google.com/d/msg/julia-users/IxrvV8ABZoQ/uWZu5-IB3McJ 
> > > 
> > > I think the bottom line is that the current implementation *should* be 
> > > 'zero-cost' once a set of planned improvements and optimizations take 
> > > place. One of the key ones is a tuple overhaul. 
> > > 
> > > Fair to say it can never be 'zero' cost since there is different 
> inherent 
> > > overhead depending on the type of subarray, e.g. offset, slice, 
> > > re-dimension, etc. however the implementation is quite clever about 
> > > allowing specialization of those. 
> > > 
> > > In a common case (e.g. a constant offset or simple stride) my 
> > > understanding 
> > > is that the structure will be type-specialized and likely stack 
> allocated 
> > > in many cases, reducing to what you'd write by hand. At least this is 
> what 
> > > they're after. 
> > > 
> > > On Friday, April 17, 2015 at 4:24:14 PM UTC-4, Peter Brady wrote: 
> > > > Thanks for the links. I'll check out ArrayViews as it looks like 
> what I 
> > > > was going to do manually without wrapping it in a type. 
> > > > 
> > > > By semi-dim agnostic I meant that the differencing algorithm itself 
> only 
> > > > cares about one dimension but that dimension is different for 
> different 
> > > > directions. Only a few toplevel routines actually need to know about 
> the 
> > > > dimensionality of the problem. 
> > > > 
> > > > On Friday, April 17, 2015 at 2:04:39 PM UTC-6, René Donner wrote: 
> > > >> As far as I have measured it sub in 0.4 is still not cheap, as it 
> > > >> provides the flexibility to deal with all kinds of strides and 
> offsets, 
> > > >> and 
> > > >> the view object itself thus has a certain size. See 
> > > >> https://github.com/rened/FunctionalData.jl#efficiency for a simple 
> > > >> analysis, where the speed is mostly dominated by the speed of the 
> > > >> "sub-view" mechanism. 
> > > >> 
> > > >> To get faster views which require strides etc you can try 
> > > >> https://github.com/JuliaLang/ArrayViews.jl 
> > > >> 
> > > >> What do you mean by semi-dim agnostic? In case you only need 
> indexing 
> > > >> along the last dimension (like a[:,:,i] and a[:,:,:,i]) you can use 
> > > >> 
> > > >> https://github.com/rened/FunctionalData.jl#efficient-views-details 
> > > >> 
> > > >> which uses normal DenseArrays and simple pointer updates 
> internally. It 
> > > >> can also update a view in-place, by just incrementing the pointer. 
> > > >> 
> > > >> Am 17.04.2015 um 21:48 schrieb Peter Brady <[email protected]>: 
> > > >> > Inorder to write some differencing algorithms in a 
> semi-dimensional 
> > > >> 
> > > >> agnostic manner the code I've written makes heavy use of subarrays 
> > > >> which 
> > > >> turn out to be rather costly. I've noticed some posts on the cost 
> of 
> > > >> subarrays here and that things will be better in 0.4. Can someone 
> > > >> comment 
> > > >> on how much better? Would subarray (or anything like it) be on par 
> with 
> > > >> simply passing an offset and stride (constant) and computing the 
> index 
> > > >> myself? I'm currently using the 0.3 release branch. 
>
>

Reply via email to