My benchmark shows that element indexing has been as fast as it can be for 
array views (or subarrays in Julia 0.4). 

Now the problem is actually the construction of views/subarrays. To 
optimize the overhead of this part, the compiler may need to introduce 
additional optimization.

Dahua 


On Monday, April 20, 2015 at 6:39:35 AM UTC+8, Sebastian Good wrote:
>
> —track-allocation still requires guesswork, as optimizations can move the 
> allocation to a different place than you would expect.
>
> On April 19, 2015 at 4:36:19 PM, Peter Brady ([email protected] 
> <javascript:>) wrote:
>
> So I discovered the --track-allocation option and now I am really 
> confused: 
>
> Here's my session:
>
>  $ julia --track-allocation=all
>                _
>    _       _ _(_)_     |  A fresh approach to technical computing
>   (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
>    _ _   _| |_  __ _   |  Type "help()" for help.
>   | | | | | | |/ _` |  |
>   | | |_| | | | (_| |  |  Version 0.3.8-pre+13 (2015-04-17 18:08 UTC)
>  _/ |\__'_|_|_|\__'_|  |  Commit 0df962d* (2 days old release-0.3)
> |__/                   |  x86_64-redhat-linux
>
> julia> include("test.jl")
> test_all (generic function with 1 method)
>
> julia> test_unsafe(5)
>  
> And here's the relevant part of the resulting test.jl.mem file.  Note that 
> I commented out some calls to 'size' and replaced with the appropriate 
> hard-coded values but the resulting allocation is the same... Can anyone 
> shed some light on this while I wait for 0.4 to compile?
>
>           - function update(a::AbstractArray, idx, off)
>   8151120     for i=1:320 #size(a, idx)
>         0         a[i] = -10*off+i
>         -     end
>         0     a
>         - end
>         - 
>        - function setk_UnSafe{T}(a::Array{T,3})
>       760     us = UnsafeSlice(a, 3)
>         0     for j=1:size(a,2),i=1:size(a,1)
>   8151120         us.start = (j-1)*320+i #size(a,1)+i
>         -         #off = sub2ind(size(a), i, j, 1)
>         0         update(us, 3, us.start)
>         -     end
>         0     a
>         - end
>         - function test_unsafe(n)
>         0     a = zeros(Int, (320, 320, 320))
>         -     # warmup
>         0     setk_UnSafe(a);
>         0     clear_malloc_data()
>         -     #@time (
>         0     for i=1:n; setk_UnSafe(a); end
>         - end
>   
>
> On Sunday, April 19, 2015 at 2:21:56 PM UTC-6, Peter Brady wrote: 
>>
>> @Dahua, thanks for adding an unsafeview!  I appreciate how quickly this 
>> community responds. 
>>
>> I've added the following function to my test.jl script
>>  function setk_unsafeview{T}(a::Array{T,3})
>>     for j=1:size(a,2),i=1:size(a,1)
>>         off = sub2ind(size(a), i, j, 1)
>>         update(unsafe_view(a, i, j, :), 3, off)
>>     end
>>     a
>> end
>>   But I'm not seeing the large increase in performance I was expecting. 
>>  My timings are now
>>
>>  julia> test_all(5);
>> test_stride
>> elapsed time: 2.156173128 seconds (0 bytes allocated)
>> test_view
>> elapsed time: 9.30964534 seconds (94208000 bytes allocated, 0.47% gc time)
>> test_unsafe
>> elapsed time: 2.169307471 seconds (16303000 bytes allocated)
>> test_unsafeview
>> elapsed time: 8.955876793 seconds (90112000 bytes allocated, 0.41% gc 
>> time)
>>  
>> To be fair, I am cheating a bit with my custom 'UnsafeSlice' since I make 
>> only one instance and simply update the offset on each iteration.  If I 
>> make it immutable and create a new instance on every iteration (as I do for 
>> the view and unsafeview), things slow down a little and the allocation goes 
>> south:
>>
>>   julia> test_all(5);
>> test_stride
>> elapsed time: 2.159909265 seconds (0 bytes allocated)
>> test_view
>> elapsed time: 9.029025282 seconds (94208000 bytes allocated, 0.43% gc 
>> time)
>> test_unsafe
>> elapsed time: 2.621667854 seconds (114606240 bytes allocated, 2.41% gc 
>> time)
>> test_unsafeview
>> elapsed time: 8.888434466 seconds (90112000 bytes allocated, 0.44% gc 
>> time)
>>  
>> These are all with 0.3.8-pre.  I'll try compiling master and see what 
>> happens.  I'm still confused about why allocating a single type with a 
>> pointer, 2 ints and a tuple costs so much memory though.
>>
>>
>>
>> On Sunday, April 19, 2015 at 11:38:17 AM UTC-6, Tim Holy wrote: 
>>>
>>> It's not just escape analysis, as this (new) issue demonstrates:
>>>  https://github.com/JuliaLang/julia/issues/10899
>>>
>>> --Tim
>>>
>>> On Sunday, April 19, 2015 12:33:51 PM Sebastian Good wrote:
>>> > Their size seems much decreased. I’d imagine to totally avoid 
>>> allocation in
>>> > this benchmark requires an optimization that really has nothing to do 
>>> with
>>> > subarrays per se. You’d have to do an escape analysis and see that Aj 
>>> never
>>> > left sumcols. Not easy in practice, since it’s passed to slice and 
>>> length,
>>> > and you’d have to make sure they didn’t squirrel it away or pass it on 
>>> to
>>> > someone else. Then you could stack allocate it, or even destructure it 
>>> into
>>> > a bunch of scalar mutations on the stack. After eliminating dead code,
>>> > you’d end up with a no-allocation loop much like you’d write by hand. 
>>> This
>>> > sort of optimization seems to be quite tricky for compilers to pull 
>>> off,
>>> > but it’s a common pattern in numerical code.
>>> >
>>> > In Julia is such cleverness left entirely to LLVM, or are there 
>>> optimization
>>> > passes in Julia itself? On April 19, 2015 at 6:49:21 AM, Tim Holy
>>> > ([email protected]) wrote:
>>> >
>>> > Sorry to be slow to chime in here, but the tuple overhaul has landed 
>>> and
>>> > they are still not zero-cost:
>>> >
>>> > function sumcols(A)
>>> > s = 0.0
>>> > for j = 1:size(A,2)
>>> > Aj = slice(A, :, j)
>>> > for i = 1:length(Aj)
>>> > s += Aj[i]
>>> > end
>>> > end
>>> > s
>>> > end
>>> >
>>> > Even in the latest 0.4, this still allocates memory. On the other hand,
>>> > while SubArrays allocate nearly 2x more memory than ArrayViews, the 
>>> speed
>>> > of the two (replacing `slice` with `view` above) is, for me, nearly
>>> > identical.
>>> >
>>> > --Tim
>>> >
>>> > On Friday, April 17, 2015 08:30:27 PM Sebastian Good wrote:
>>> > > This was discussed a few weeks ago
>>> > >
>>> > > https://groups.google.com/d/msg/julia-users/IxrvV8ABZoQ/uWZu5-IB3McJ
>>> > >
>>> > > I think the bottom line is that the current implementation *should* 
>>> be
>>> > > 'zero-cost' once a set of planned improvements and optimizations take
>>> > > place. One of the key ones is a tuple overhaul.
>>> > >
>>> > > Fair to say it can never be 'zero' cost since there is different 
>>> inherent
>>> > > overhead depending on the type of subarray, e.g. offset, slice,
>>> > > re-dimension, etc. however the implementation is quite clever about
>>> > > allowing specialization of those.
>>> > >
>>> > > In a common case (e.g. a constant offset or simple stride) my
>>> > > understanding
>>> > > is that the structure will be type-specialized and likely stack 
>>> allocated
>>> > > in many cases, reducing to what you'd write by hand. At least this 
>>> is what
>>> > > they're after.
>>> > >
>>> > > On Friday, April 17, 2015 at 4:24:14 PM UTC-4, Peter Brady wrote:
>>> > > > Thanks for the links. I'll check out ArrayViews as it looks like 
>>> what I
>>> > > > was going to do manually without wrapping it in a type.
>>> > > >
>>> > > > By semi-dim agnostic I meant that the differencing algorithm 
>>> itself only
>>> > > > cares about one dimension but that dimension is different for 
>>> different
>>> > > > directions. Only a few toplevel routines actually need to know 
>>> about the
>>> > > > dimensionality of the problem.
>>> > > >
>>> > > > On Friday, April 17, 2015 at 2:04:39 PM UTC-6, René Donner wrote:
>>> > > >> As far as I have measured it sub in 0.4 is still not cheap, as it
>>> > > >> provides the flexibility to deal with all kinds of strides and 
>>> offsets,
>>> > > >> and
>>> > > >> the view object itself thus has a certain size. See
>>> > > >> https://github.com/rened/FunctionalData.jl#efficiency for a 
>>> simple
>>> > > >> analysis, where the speed is mostly dominated by the speed of the
>>> > > >> "sub-view" mechanism.
>>> > > >>
>>> > > >> To get faster views which require strides etc you can try
>>> > > >> https://github.com/JuliaLang/ArrayViews.jl
>>> > > >>
>>> > > >> What do you mean by semi-dim agnostic? In case you only need 
>>> indexing
>>> > > >> along the last dimension (like a[:,:,i] and a[:,:,:,i]) you can 
>>> use
>>> > > >>
>>> > > >> 
>>> https://github.com/rened/FunctionalData.jl#efficient-views-details
>>> > > >>
>>> > > >> which uses normal DenseArrays and simple pointer updates 
>>> internally. It
>>> > > >> can also update a view in-place, by just incrementing the pointer.
>>> > > >>
>>> > > >> Am 17.04.2015 um 21:48 schrieb Peter Brady <[email protected]>:
>>> > > >> > Inorder to write some differencing algorithms in a 
>>> semi-dimensional
>>> > > >>
>>> > > >> agnostic manner the code I've written makes heavy use of subarrays
>>> > > >> which
>>> > > >> turn out to be rather costly. I've noticed some posts on the cost 
>>> of
>>> > > >> subarrays here and that things will be better in 0.4. Can someone
>>> > > >> comment
>>> > > >> on how much better? Would subarray (or anything like it) be on 
>>> par with
>>> > > >> simply passing an offset and stride (constant) and computing the 
>>> index
>>> > > >> myself? I'm currently using the 0.3 release branch.
>>>
>>>   

Reply via email to