This is an expansion of a comment I made in my previous email, in hopes we 
might have a separate discussion of it. Consider the documentation 
(emphasis mine).

Operations on AbstractArray objects are defined using higher level 
> operators and functions, *in a way that is independent of the underlying 
> storage*. These operations generally work correctly as a fallback for any 
> specific array implementation... The AbstractArray type includes anything 
> vaguely array-like, and implementations of it might be quite different from 
> conventional arrays.


Given the intent to allow AbstractArray to be extremely flexible, some of 
the default implementations seem unnecessarily specialized and potentially 
very inefficient. For instance, in theory, hcat and vcat are just cat(2,...) 
and cat(1,...), so an AbstractArray implementation could be quite short for 
these methods. But hcat has the following specializations:

function hcat{T}(V::AbstractVector{T}...)
function vcat{T}(V::AbstractVector{T}...)
function hcat{T}(A::AbstractVecOrMat{T}...)
function vcat{T}(A::AbstractMatrix{T}...)

I can *totally* understand why these specializations are worthwhile for 
DenseArray or Array objects. For instance, in the first three of these 
specalizations, you can take advantage of the fact that, when using Fortran 
ordering, the input matrices will be laid out *linearly and consecutively *in 
the output matrix. And indeed, if you look at the implementation of these 
methods, the implementations are taking advantage of this fact.

But what happens if the concrete matrix uses C ordering, or perhaps some 
sort of novel/compressed/distributed storage method? These implementations 
be suboptimal---and not just because of the layout assumption, but also 
because they use straight for-loops or list comprehensions to move elements 
*one 
element at a time. *Again, for Array objects, those loops are going to be 
nice and tight when compiled down. But consider how inefficient that is for 
sparse matrices, for instance.

Of course, there are a couple of potential solutions for the creator of a 
concrete array class. One is to study abstractarray.jl carefully and 
override every single specialized method provided. Of course, this 
increases the work required of the type designer, and is somewhat counter 
to the purpose of having a truly abstract class. And this issue is made 
more complex by the challenge of what to do about *heterogeneous *mixtures 
of abstract arrays? For instance, what to do about concatenations of sparse 
and dense arrays? In this case, I would think that the sparse array class 
should be able to decide for itself whether to emit a dense or a sparse 
array, depending on the total sparsity level of the input. That's going to 
be made more difficult by having to override multiple methods; I'm finding 
that even the invoke approach is not covering all the cases.

I would suggest, therefore, that the right solution in cases like this is:

   - Move these "specialized" methods up to the DenseArray or Array level, 
   truly keeping AbstractArray as light as possible.
   - Reimplement the AbstractArray methods that remain to *avoid 
*single-element 
   moves, preferring block moves, in case the concrete types have a faster way 
   to handle those.
   - Find an elegant solution to the heterogeneous array dispatch problem. 

On this last point, perhaps a promote_array hierarchy is in order here. 
Consider the implementation of cat in AbstractArray instance:

cat(catdim::Integer, A::AbstractArray...) = cat_t(catdim, promote_eltype(A
...), A...)

I like this: it doesn't do the *conversion *yet, but at least it 
establishes what the eltype should be. Well, what if, instead of 
promote_eltype, there was some sort of promote_array method that abstract 
array classes could override. The purpose of this method would be to 
determine not just the final eltype but the final *concrete* array class as 
well. New array classes could override promote_array as needed. And if the 
second argument of cat_t were the output of promote_array, subclasses could 
override it and provide their own storage-optimized implementation without 
having to reimplement all the dispatch.

Thoughts are welcome. If this looks promising I could even come up with a 
prototype of promote_array.

Reply via email to