Re: [julia-users] zero cost subarray?

Peter Brady Mon, 20 Apr 2015 08:05:11 -0700

Accidentally hit reply instead of reply-all. Sorry for the double post.

Ran my script in 0.4 and got these results...


julia> test_all(5)
test_stride
elapsed time: 2.008043041 seconds (0 bytes allocated)
test_view
elapsed time: 8.871387399 seconds (42 MB allocated, 0.23% gc time in 2
pauses with 1 full sweep)
test_unsafe
elapsed time: 2.308598574 seconds (46 MB allocated, 0.68% gc time in 2
pauses with 1 full sweep)
test_unsafeview
elapsed time: 9.106651158 seconds (0 bytes allocated)

julia> test_all(10)
test_stride
elapsed time: 4.012240175 seconds (0 bytes allocated)
test_view
elapsed time: 18.085514211 seconds (85 MB allocated, 0.16% gc time in 4
pauses with 1 full sweep)
test_unsafe
elapsed time: 4.477773618 seconds (93 MB allocated, 1.12% gc time in 4
pauses with 1 full sweep)
test_unsafeview
elapsed time: 18.146163969 seconds (0 bytes allocated)

So the allocation for the new unsafeview has been reduced to zero but it
has become slower than the regular view.

Perhaps the compiler optimizations that have been discussed here are
occuring since the only occurence of 'unsafeview' is the argument to a
function.

On Mon, Apr 20, 2015 at 12:57 AM, René Donner <[email protected]> wrote:

> What about something like unsafe_updateview!(view, indices...) ?
>
> It could be used like this (pseudocode):
>
>   view = unsafe_view(data, 1, 1, :)  # to construct / allocate
>   for i in ..., j in ...
>     unsafe_updateview!(view, i, j, :)
>     # use view
>   end
>
> In the trivial case of unsafe_view(data, :, :, i) this would boil down to
> a single pointer update. Of course passing around these views outside of
> their scope is rather discouraged. I use this pattern a lot and it proved
> to be very handy / fast.
>
>
>
> Am 20.04.2015 um 02:08 schrieb Dahua Lin <[email protected]>:
>
> > My benchmark shows that element indexing has been as fast as it can be
> for array views (or subarrays in Julia 0.4).
> >
> > Now the problem is actually the construction of views/subarrays. To
> optimize the overhead of this part, the compiler may need to introduce
> additional optimization.
> >
> > Dahua
> >
> >
> > On Monday, April 20, 2015 at 6:39:35 AM UTC+8, Sebastian Good wrote:
> > —track-allocation still requires guesswork, as optimizations can move
> the allocation to a different place than you would expect.
> > On April 19, 2015 at 4:36:19 PM, Peter Brady ([email protected]) wrote:
> >
> >> So I discovered the --track-allocation option and now I am really
> confused:
> >>
> >> Here's my session:
> >>
> >> $ julia --track-allocation=all
> >>                _
> >>    _       _ _(_)_     |  A fresh approach to technical computing
> >>   (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
> >>    _ _   _| |_  __ _   |  Type "help()" for help.
> >>   | | | | | | |/ _` |  |
> >>   | | |_| | | | (_| |  |  Version 0.3.8-pre+13 (2015-04-17 18:08 UTC)
> >>  _/ |\__'_|_|_|\__'_|  |  Commit 0df962d* (2 days old release-0.3)
> >> |__/                   |  x86_64-redhat-linux
> >>
> >> julia> include("test.jl")
> >> test_all (generic function with 1 method)
> >>
> >> julia> test_unsafe(5)
> >>
> >> And here's the relevant part of the resulting test.jl.mem file.  Note
> that I commented out some calls to 'size' and replaced with the appropriate
> hard-coded values but the resulting allocation is the same... Can anyone
> shed some light on this while I wait for 0.4 to compile?
> >>
> >>         - function update(a::AbstractArray, idx, off)
> >>   8151120     for i=1:320 #size(a, idx)
> >>         0         a[i] = -10*off+i
> >>         -     end
> >>         0     a
> >>         - end
> >>         -
> >>        - function setk_UnSafe{T}(a::Array{T,3})
> >>       760     us = UnsafeSlice(a, 3)
> >>         0     for j=1:size(a,2),i=1:size(a,1)
> >>   8151120         us.start = (j-1)*320+i #size(a,1)+i
> >>         -         #off = sub2ind(size(a), i, j, 1)
> >>         0         update(us, 3, us.start)
> >>         -     end
> >>         0     a
> >>         - end
> >>         - function test_unsafe(n)
> >>         0     a = zeros(Int, (320, 320, 320))
> >>         -     # warmup
> >>         0     setk_UnSafe(a);
> >>         0     clear_malloc_data()
> >>         -     #@time (
> >>         0     for i=1:n; setk_UnSafe(a); end
> >>         - end
> >>
> >>
> >> On Sunday, April 19, 2015 at 2:21:56 PM UTC-6, Peter Brady wrote:
> >> @Dahua, thanks for adding an unsafeview!  I appreciate how quickly this
> community responds.
> >>
> >> I've added the following function to my test.jl script
> >> function setk_unsafeview{T}(a::Array{T,3})
> >>     for j=1:size(a,2),i=1:size(a,1)
> >>         off = sub2ind(size(a), i, j, 1)
> >>         update(unsafe_view(a, i, j, :), 3, off)
> >>     end
> >>     a
> >> end
> >>  But I'm not seeing the large increase in performance I was expecting.
> My timings are now
> >>
> >> julia> test_all(5);
> >> test_stride
> >> elapsed time: 2.156173128 seconds (0 bytes allocated)
> >> test_view
> >> elapsed time: 9.30964534 seconds (94208000 bytes allocated, 0.47% gc
> time)
> >> test_unsafe
> >> elapsed time: 2.169307471 seconds (16303000 bytes allocated)
> >> test_unsafeview
> >> elapsed time: 8.955876793 seconds (90112000 bytes allocated, 0.41% gc
> time)
> >>
> >> To be fair, I am cheating a bit with my custom 'UnsafeSlice' since I
> make only one instance and simply update the offset on each iteration.  If
> I make it immutable and create a new instance on every iteration (as I do
> for the view and unsafeview), things slow down a little and the allocation
> goes south:
> >>
> >> julia> test_all(5);
> >> test_stride
> >> elapsed time: 2.159909265 seconds (0 bytes allocated)
> >> test_view
> >> elapsed time: 9.029025282 seconds (94208000 bytes allocated, 0.43% gc
> time)
> >> test_unsafe
> >> elapsed time: 2.621667854 seconds (114606240 bytes allocated, 2.41% gc
> time)
> >> test_unsafeview
> >> elapsed time: 8.888434466 seconds (90112000 bytes allocated, 0.44% gc
> time)
> >>
> >> These are all with 0.3.8-pre.  I'll try compiling master and see what
> happens.  I'm still confused about why allocating a single type with a
> pointer, 2 ints and a tuple costs so much memory though.
> >>
> >>
> >>
> >> On Sunday, April 19, 2015 at 11:38:17 AM UTC-6, Tim Holy wrote:
> >> It's not just escape analysis, as this (new) issue demonstrates:
> >> https://github.com/JuliaLang/julia/issues/10899
> >>
> >> --Tim
> >>
> >> On Sunday, April 19, 2015 12:33:51 PM Sebastian Good wrote:
> >> > Their size seems much decreased. I’d imagine to totally avoid
> allocation in
> >> > this benchmark requires an optimization that really has nothing to do
> with
> >> > subarrays per se. You’d have to do an escape analysis and see that Aj
> never
> >> > left sumcols. Not easy in practice, since it’s passed to slice and
> length,
> >> > and you’d have to make sure they didn’t squirrel it away or pass it
> on to
> >> > someone else. Then you could stack allocate it, or even destructure
> it into
> >> > a bunch of scalar mutations on the stack. After eliminating dead code,
> >> > you’d end up with a no-allocation loop much like you’d write by hand.
> This
> >> > sort of optimization seems to be quite tricky for compilers to pull
> off,
> >> > but it’s a common pattern in numerical code.
> >> >
> >> > In Julia is such cleverness left entirely to LLVM, or are there
> optimization
> >> > passes in Julia itself? On April 19, 2015 at 6:49:21 AM, Tim Holy
> >> > ([email protected]) wrote:
> >> >
> >> > Sorry to be slow to chime in here, but the tuple overhaul has landed
> and
> >> > they are still not zero-cost:
> >> >
> >> > function sumcols(A)
> >> > s = 0.0
> >> > for j = 1:size(A,2)
> >> > Aj = slice(A, :, j)
> >> > for i = 1:length(Aj)
> >> > s += Aj[i]
> >> > end
> >> > end
> >> > s
> >> > end
> >> >
> >> > Even in the latest 0.4, this still allocates memory. On the other
> hand,
> >> > while SubArrays allocate nearly 2x more memory than ArrayViews, the
> speed
> >> > of the two (replacing `slice` with `view` above) is, for me, nearly
> >> > identical.
> >> >
> >> > --Tim
> >> >
> >> > On Friday, April 17, 2015 08:30:27 PM Sebastian Good wrote:
> >> > > This was discussed a few weeks ago
> >> > >
> >> > >
> https://groups.google.com/d/msg/julia-users/IxrvV8ABZoQ/uWZu5-IB3McJ
> >> > >
> >> > > I think the bottom line is that the current implementation *should*
> be
> >> > > 'zero-cost' once a set of planned improvements and optimizations
> take
> >> > > place. One of the key ones is a tuple overhaul.
> >> > >
> >> > > Fair to say it can never be 'zero' cost since there is different
> inherent
> >> > > overhead depending on the type of subarray, e.g. offset, slice,
> >> > > re-dimension, etc. however the implementation is quite clever about
> >> > > allowing specialization of those.
> >> > >
> >> > > In a common case (e.g. a constant offset or simple stride) my
> >> > > understanding
> >> > > is that the structure will be type-specialized and likely stack
> allocated
> >> > > in many cases, reducing to what you'd write by hand. At least this
> is what
> >> > > they're after.
> >> > >
> >> > > On Friday, April 17, 2015 at 4:24:14 PM UTC-4, Peter Brady wrote:
> >> > > > Thanks for the links. I'll check out ArrayViews as it looks like
> what I
> >> > > > was going to do manually without wrapping it in a type.
> >> > > >
> >> > > > By semi-dim agnostic I meant that the differencing algorithm
> itself only
> >> > > > cares about one dimension but that dimension is different for
> different
> >> > > > directions. Only a few toplevel routines actually need to know
> about the
> >> > > > dimensionality of the problem.
> >> > > >
> >> > > > On Friday, April 17, 2015 at 2:04:39 PM UTC-6, René Donner wrote:
> >> > > >> As far as I have measured it sub in 0.4 is still not cheap, as it
> >> > > >> provides the flexibility to deal with all kinds of strides and
> offsets,
> >> > > >> and
> >> > > >> the view object itself thus has a certain size. See
> >> > > >> https://github.com/rened/FunctionalData.jl#efficiency for a
> simple
> >> > > >> analysis, where the speed is mostly dominated by the speed of the
> >> > > >> "sub-view" mechanism.
> >> > > >>
> >> > > >> To get faster views which require strides etc you can try
> >> > > >> https://github.com/JuliaLang/ArrayViews.jl
> >> > > >>
> >> > > >> What do you mean by semi-dim agnostic? In case you only need
> indexing
> >> > > >> along the last dimension (like a[:,:,i] and a[:,:,:,i]) you can
> use
> >> > > >>
> >> > > >>
> https://github.com/rened/FunctionalData.jl#efficient-views-details
> >> > > >>
> >> > > >> which uses normal DenseArrays and simple pointer updates
> internally. It
> >> > > >> can also update a view in-place, by just incrementing the
> pointer.
> >> > > >>
> >> > > >> Am 17.04.2015 um 21:48 schrieb Peter Brady <[email protected]>:
> >> > > >> > Inorder to write some differencing algorithms in a
> semi-dimensional
> >> > > >>
> >> > > >> agnostic manner the code I've written makes heavy use of
> subarrays
> >> > > >> which
> >> > > >> turn out to be rather costly. I've noticed some posts on the
> cost of
> >> > > >> subarrays here and that things will be better in 0.4. Can someone
> >> > > >> comment
> >> > > >> on how much better? Would subarray (or anything like it) be on
> par with
> >> > > >> simply passing an offset and stride (constant) and computing the
> index
> >> > > >> myself? I'm currently using the 0.3 release branch.
> >>
>
>

Re: [julia-users] zero cost subarray?

Reply via email to