So I discovered the --track-allocation option and now I am really confused:
Here's my session:
$ julia --track-allocation=all
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "help()" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.3.8-pre+13 (2015-04-17 18:08 UTC)
_/ |\__'_|_|_|\__'_| | Commit 0df962d* (2 days old release-0.3)
|__/ | x86_64-redhat-linux
julia> include("test.jl")
test_all (generic function with 1 method)
julia> test_unsafe(5)
And here's the relevant part of the resulting test.jl.mem file. Note that
I commented out some calls to 'size' and replaced with the appropriate
hard-coded values but the resulting allocation is the same... Can anyone
shed some light on this while I wait for 0.4 to compile?
- function update(a::AbstractArray, idx, off)
8151120 for i=1:320 #size(a, idx)
0 a[i] = -10*off+i
- end
0 a
- end
-
- function setk_UnSafe{T}(a::Array{T,3})
760 us = UnsafeSlice(a, 3)
0 for j=1:size(a,2),i=1:size(a,1)
8151120 us.start = (j-1)*320+i #size(a,1)+i
- #off = sub2ind(size(a), i, j, 1)
0 update(us, 3, us.start)
- end
0 a
- end
- function test_unsafe(n)
0 a = zeros(Int, (320, 320, 320))
- # warmup
0 setk_UnSafe(a);
0 clear_malloc_data()
- #@time (
0 for i=1:n; setk_UnSafe(a); end
- end
On Sunday, April 19, 2015 at 2:21:56 PM UTC-6, Peter Brady wrote:
>
> @Dahua, thanks for adding an unsafeview! I appreciate how quickly this
> community responds.
>
> I've added the following function to my test.jl script
> function setk_unsafeview{T}(a::Array{T,3})
> for j=1:size(a,2),i=1:size(a,1)
> off = sub2ind(size(a), i, j, 1)
> update(unsafe_view(a, i, j, :), 3, off)
> end
> a
> end
> But I'm not seeing the large increase in performance I was expecting. My
> timings are now
>
> julia> test_all(5);
> test_stride
> elapsed time: 2.156173128 seconds (0 bytes allocated)
> test_view
> elapsed time: 9.30964534 seconds (94208000 bytes allocated, 0.47% gc time)
> test_unsafe
> elapsed time: 2.169307471 seconds (16303000 bytes allocated)
> test_unsafeview
> elapsed time: 8.955876793 seconds (90112000 bytes allocated, 0.41% gc time)
>
> To be fair, I am cheating a bit with my custom 'UnsafeSlice' since I make
> only one instance and simply update the offset on each iteration. If I
> make it immutable and create a new instance on every iteration (as I do for
> the view and unsafeview), things slow down a little and the allocation goes
> south:
>
> julia> test_all(5);
> test_stride
> elapsed time: 2.159909265 seconds (0 bytes allocated)
> test_view
> elapsed time: 9.029025282 seconds (94208000 bytes allocated, 0.43% gc time)
> test_unsafe
> elapsed time: 2.621667854 seconds (114606240 bytes allocated, 2.41% gc
> time)
> test_unsafeview
> elapsed time: 8.888434466 seconds (90112000 bytes allocated, 0.44% gc time)
>
> These are all with 0.3.8-pre. I'll try compiling master and see what
> happens. I'm still confused about why allocating a single type with a
> pointer, 2 ints and a tuple costs so much memory though.
>
>
>
> On Sunday, April 19, 2015 at 11:38:17 AM UTC-6, Tim Holy wrote:
>>
>> It's not just escape analysis, as this (new) issue demonstrates:
>> https://github.com/JuliaLang/julia/issues/10899
>>
>> --Tim
>>
>> On Sunday, April 19, 2015 12:33:51 PM Sebastian Good wrote:
>> > Their size seems much decreased. I’d imagine to totally avoid
>> allocation in
>> > this benchmark requires an optimization that really has nothing to do
>> with
>> > subarrays per se. You’d have to do an escape analysis and see that Aj
>> never
>> > left sumcols. Not easy in practice, since it’s passed to slice and
>> length,
>> > and you’d have to make sure they didn’t squirrel it away or pass it on
>> to
>> > someone else. Then you could stack allocate it, or even destructure it
>> into
>> > a bunch of scalar mutations on the stack. After eliminating dead code,
>> > you’d end up with a no-allocation loop much like you’d write by hand.
>> This
>> > sort of optimization seems to be quite tricky for compilers to pull
>> off,
>> > but it’s a common pattern in numerical code.
>> >
>> > In Julia is such cleverness left entirely to LLVM, or are there
>> optimization
>> > passes in Julia itself? On April 19, 2015 at 6:49:21 AM, Tim Holy
>> > ([email protected]) wrote:
>> >
>> > Sorry to be slow to chime in here, but the tuple overhaul has landed
>> and
>> > they are still not zero-cost:
>> >
>> > function sumcols(A)
>> > s = 0.0
>> > for j = 1:size(A,2)
>> > Aj = slice(A, :, j)
>> > for i = 1:length(Aj)
>> > s += Aj[i]
>> > end
>> > end
>> > s
>> > end
>> >
>> > Even in the latest 0.4, this still allocates memory. On the other hand,
>> > while SubArrays allocate nearly 2x more memory than ArrayViews, the
>> speed
>> > of the two (replacing `slice` with `view` above) is, for me, nearly
>> > identical.
>> >
>> > --Tim
>> >
>> > On Friday, April 17, 2015 08:30:27 PM Sebastian Good wrote:
>> > > This was discussed a few weeks ago
>> > >
>> > > https://groups.google.com/d/msg/julia-users/IxrvV8ABZoQ/uWZu5-IB3McJ
>> > >
>> > > I think the bottom line is that the current implementation *should*
>> be
>> > > 'zero-cost' once a set of planned improvements and optimizations take
>> > > place. One of the key ones is a tuple overhaul.
>> > >
>> > > Fair to say it can never be 'zero' cost since there is different
>> inherent
>> > > overhead depending on the type of subarray, e.g. offset, slice,
>> > > re-dimension, etc. however the implementation is quite clever about
>> > > allowing specialization of those.
>> > >
>> > > In a common case (e.g. a constant offset or simple stride) my
>> > > understanding
>> > > is that the structure will be type-specialized and likely stack
>> allocated
>> > > in many cases, reducing to what you'd write by hand. At least this is
>> what
>> > > they're after.
>> > >
>> > > On Friday, April 17, 2015 at 4:24:14 PM UTC-4, Peter Brady wrote:
>> > > > Thanks for the links. I'll check out ArrayViews as it looks like
>> what I
>> > > > was going to do manually without wrapping it in a type.
>> > > >
>> > > > By semi-dim agnostic I meant that the differencing algorithm itself
>> only
>> > > > cares about one dimension but that dimension is different for
>> different
>> > > > directions. Only a few toplevel routines actually need to know
>> about the
>> > > > dimensionality of the problem.
>> > > >
>> > > > On Friday, April 17, 2015 at 2:04:39 PM UTC-6, René Donner wrote:
>> > > >> As far as I have measured it sub in 0.4 is still not cheap, as it
>> > > >> provides the flexibility to deal with all kinds of strides and
>> offsets,
>> > > >> and
>> > > >> the view object itself thus has a certain size. See
>> > > >> https://github.com/rened/FunctionalData.jl#efficiency for a
>> simple
>> > > >> analysis, where the speed is mostly dominated by the speed of the
>> > > >> "sub-view" mechanism.
>> > > >>
>> > > >> To get faster views which require strides etc you can try
>> > > >> https://github.com/JuliaLang/ArrayViews.jl
>> > > >>
>> > > >> What do you mean by semi-dim agnostic? In case you only need
>> indexing
>> > > >> along the last dimension (like a[:,:,i] and a[:,:,:,i]) you can
>> use
>> > > >>
>> > > >> https://github.com/rened/FunctionalData.jl#efficient-views-details
>> > > >>
>> > > >> which uses normal DenseArrays and simple pointer updates
>> internally. It
>> > > >> can also update a view in-place, by just incrementing the pointer.
>> > > >>
>> > > >> Am 17.04.2015 um 21:48 schrieb Peter Brady <[email protected]>:
>> > > >> > Inorder to write some differencing algorithms in a
>> semi-dimensional
>> > > >>
>> > > >> agnostic manner the code I've written makes heavy use of subarrays
>> > > >> which
>> > > >> turn out to be rather costly. I've noticed some posts on the cost
>> of
>> > > >> subarrays here and that things will be better in 0.4. Can someone
>> > > >> comment
>> > > >> on how much better? Would subarray (or anything like it) be on par
>> with
>> > > >> simply passing an offset and stride (constant) and computing the
>> index
>> > > >> myself? I'm currently using the 0.3 release branch.
>>
>>