So I've done some experimenting I think my use case easily lends itself to 
a custom datatype (basically a Ptr with an offset and stride) named 
'UnSafeSlice' below. The timings are pretty close to simply passing the 
array and offset/stride but the memory allocation is way off.

I've posted my test.jl script and I'm not sure where all the allocations 
are coming from.  Any help would be greatly appreciated.

Here's the output of running the script on my machine
julia> include("test.jl")
test_all (generic function with 1 method)

julia> test_all(5)
test_stride
elapsed time: 1.596122539 seconds (0 bytes allocated)
test_view
elapsed time: 9.502840329 seconds (94208000 bytes allocated, 0.38% gc time)
test_unsafe
elapsed time: 1.683452985 seconds (16303000 bytes allocated)

Here's my test.jl script
using ArrayViews
import Base: size, getindex, setindex!, ndims
type UnsafeSlice{T,N, P<:Ptr} <: AbstractArray
    p::P
    start::Int
    stride::Int
    size::NTuple{N,Int}
end
size(s::UnsafeSlice) = s.size
size(s::UnsafeSlice, i::Int) = s.size[i]
ndims{T,N}(s::UnsafeSlice{T,N}) = N
getindex(s::UnsafeSlice, i::Int) = unsafe_load(s.p, s.start+(i-1)*s.stride)
setindex!(s::UnsafeSlice, x, i::Int) = unsafe_store!(s.p, x, 
s.start+(i-1)*s.stride)

function UnsafeSlice(a, slicedim::Int, start=1)
    p = pointer(a)
    str = stride(a, slicedim)
    UnsafeSlice{eltype(a), ndims(a), typeof(p)}(p, start, str, size(a))
end

function update(a::UnsafeSlice, idx, off)
    for i=1:size(a, idx)
        a[i] = -10*off+i
    end
    a
end
function update(a::AbstractArray, idx, off)
    for i=1:size(a, idx)
        a[i] = -10*off+i
    end
    a
end
function update(a::Array, idx, off, stride)
    for i=1:size(a, idx)
        a[off+(i-1)*stride] = -10*off+i
    end
    a
end
function setk_UnSafe{T}(a::Array{T,3})
    us = UnsafeSlice(a, 3)
    for j=1:size(a,2),i=1:size(a,1)
        us.start = (j-1)*size(a,1)+i
        #off = sub2ind(size(a), i, j, 1)
        update(us, 3, us.start)
    end
    a
end
function setk_View{T}(a::Array{T,3})
    for j=1:size(a,2),i=1:size(a,1)
        off = sub2ind(size(a), i, j, 1)
        update(view(a, i, j, :), 3, off)
    end
    a
end
function setk_Stride{T}(a::Array{T,3})
    str = stride(a, 3)
    for j=1:size(a,2),i=1:size(a,1)
        off = (j-1)*size(a,1)+i
        update(a, 3, off, str)
    end
    a
end

function test_stride(n)
    a = zeros(Int, (320, 320, 320))
    # warmup
    setk_Stride(a);
    @time (for i=1:n; setk_Stride(a); end)
end
function test_view(n)
    a = zeros(Int, (320, 320, 320))
    # warmup
    setk_View(a);
    @time (for i=1:n; setk_View(a); end)
end
function test_unsafe(n)
    a = zeros(Int, (320, 320, 320))
    # warmup
    setk_UnSafe(a);
    @time (for i=1:n; setk_UnSafe(a); end)
end

function test_all(n)
    println("test_stride")
    test_stride(n)
    println("test_view")
    test_view(n)
    println("test_unsafe")
    test_unsafe(n)
end



On Friday, April 17, 2015 at 9:30:27 PM UTC-6, Sebastian Good wrote:
>
> This was discussed a few weeks ago
>
> https://groups.google.com/d/msg/julia-users/IxrvV8ABZoQ/uWZu5-IB3McJ
>
> I think the bottom line is that the current implementation *should* be 
> 'zero-cost' once a set of planned improvements and optimizations take 
> place. One of the key ones is a tuple overhaul.
>
> Fair to say it can never be 'zero' cost since there is different inherent 
> overhead depending on the type of subarray, e.g. offset, slice, 
> re-dimension, etc. however the implementation is quite clever about 
> allowing specialization of those.
>
> In a common case (e.g. a constant offset or simple stride) my 
> understanding is that the structure will be type-specialized and likely 
> stack allocated in many cases, reducing to what you'd write by hand. At 
> least this is what they're after.
>
> On Friday, April 17, 2015 at 4:24:14 PM UTC-4, Peter Brady wrote:
>>
>> Thanks for the links.  I'll check out ArrayViews as it looks like what I 
>> was going to do manually without wrapping it in a type.
>>
>> By semi-dim agnostic I meant that the differencing algorithm itself only 
>> cares about one dimension but that dimension is different for different 
>> directions. Only a few toplevel routines actually need to know about the 
>> dimensionality of the problem. 
>>
>> On Friday, April 17, 2015 at 2:04:39 PM UTC-6, René Donner wrote:
>>>
>>> As far as I have measured it sub in 0.4 is still not cheap, as it 
>>> provides the flexibility to deal with all kinds of strides and offsets, and 
>>> the view object itself thus has a certain size. See 
>>> https://github.com/rened/FunctionalData.jl#efficiency for a simple 
>>> analysis, where the speed is mostly dominated by the speed of the 
>>> "sub-view" mechanism. 
>>>
>>> To get faster views which require strides etc you can try 
>>> https://github.com/JuliaLang/ArrayViews.jl 
>>>
>>> What do you mean by semi-dim agnostic? In case you only need indexing 
>>> along the last dimension (like a[:,:,i] and a[:,:,:,i]) you can use 
>>>   https://github.com/rened/FunctionalData.jl#efficient-views-details 
>>> which uses normal DenseArrays and simple pointer updates internally. It 
>>> can also update a view in-place, by just incrementing the pointer. 
>>>
>>>
>>>
>>> Am 17.04.2015 um 21:48 schrieb Peter Brady <[email protected]>: 
>>>
>>> > Inorder to write some differencing algorithms in a semi-dimensional 
>>> agnostic manner the code I've written makes heavy use of subarrays which 
>>> turn out to be rather costly. I've noticed some posts on the cost of 
>>> subarrays here and that things will be better in 0.4.  Can someone comment 
>>> on how much better?  Would subarray (or anything like it) be on par with 
>>> simply passing an offset and stride (constant) and computing the index 
>>> myself? I'm currently using the 0.3 release branch. 
>>>
>>>

Reply via email to