Hi Matthew,

Great questions.

On Monday, February 17, 2014 10:44:12 AM Matthew Frank wrote:
> I have been trying to use SubArrays to reduce the number of matrix copies
> that my code is doing, but I've been surprised to find that replacing
> slices and extra copies with SubArrays seems to slow things down.

Julia's handling of SubArrays has long been...suboptimal, but it's getting 
better. If you had tried this a year ago, you would have seen slowdowns of 
30x. There's still room for improvement, however. This is why A[indx1, indx2] 
currently returns a copy rather than a SubArray.

> My first question is what is the difference between the "+" operator and
> the (AFAIK undocumented) ".+" operator for SubArrays?  .+ seems to be about
> 3x faster, although the output seems to be the same.  (img is a 2048x3072
> Array{Float32,2}).
> 
> julia> @time result_no_dot = sub(img,:,1:3072-2) + sub(img,:,3:3072)
> elapsed time: 0.182192137 seconds (25151376 bytes allocated)
> 
> julia> @time result_with_dot = sub(img,:,1:3072-2) .+ sub(img,:,3:3072)
> elapsed time: 0.060138815 seconds (25152360 bytes allocated)

Best way to diagnose this kind of thing is to use @which:
julia> @which  sub(img,:,1:3072-2) + sub(img,:,3:3072)
+{S,T}(A::Union(DenseArray{S,N},SubArray{S,N,A<:DenseArray{T,N},I<:
(Union(Range{Int64},Range1{Int64},Int64)...,)}),B::Union(DenseArray{T,N},SubArray{T,N,A<:DenseArray{T,N},I<:
(Union(Range{Int64},Range1{Int64},Int64)...,)})) at array.jl:754

julia> @which sub(img,:,1:3072-2) .+ sub(img,:,3:3072)
.+(As::AbstractArray{T,N}...) at broadcast.jl:148

So as you surmised, they're calling two different functions. All "dot" 
operators are broadcasting, so the second isn't a surprise.

The issue with the first can be traced to these lines:
            for i=1:length(A)
                @inbounds F[i] = ($f)(A[i], B[i])
            end
Here f = +, so it's summing them. The problem is that linear indexing (A[i]) 
is slow for SubArrays.

We're partway through a transition to Cartesian indexing, but not all 
functions have been fixed yet.

> 
> julia> result_no_dot == result_with_dot
> true
> 
> I also don't understand what the profiler is telling me.  With the "+"
> operator I get:
> 
> 9  multidimensional.jl; getindex; line: 119
> 1  multidimensional.jl; getindex; line: 120
> 83 multidimensional.jl; getindex; line: 123
> 15 multidimensional.jl; getindex; line: 124
> 2  multidimensional.jl; getindex; line: 125
> 26 multidimensional.jl; getindex; line: 127
>               30 profile.jl; anonymous; line: 14
>                 21 array.jl; +; line: 756
>                 7  multidimensional.jl; getindex; line: 119

Ugh, you've gotten bitten by a bad case of truncated backtraces (issue #3469). 
That indeed makes this pretty hard to interpret. These should go away with the 
MCJIT work that's ongoing (but it's a herculean effort).

> My second question is, is there a way to avoid doing any allocation at all
> (short of de-vectorizing)?  Even when I preallocate an array for the
> result, the "+" and ".+" operators (or maybe the assignment operator) seem
> to create an extra copy of the array:

Sure, just "devectorize", e.g.,
    for j = 1:size(A,2), i = 1:size(A,1)
        my_result[i,j] = A[i,j] + B[i,j]
    end

> My third question is what is the slicing operator doing that makes it so
> much faster than the sub operator, even though it has to make an extra copy?

It's all because everything is better optimized for Arrays than SubArrays 
right now. You could try the devectorized loop and add an @inbounds (after 
first doing suitable bounds-checking) and see if it makes it better.

--Tim

Reply via email to