I'm not an expert in this area, and haven't had to extend AbstractArray
myself, but what you are saying makes sense.
I think the best place to continue this would be as a Github Issue though,
as we can more easily cross-reference it with other relevant issues (of
which there are many, if I recall correctly).
On Wednesday, August 6, 2014 10:10:43 AM UTC-4, Michael Grant wrote:
>
> This is an expansion of a comment I made in my previous email, in hopes we
> might have a separate discussion of it. Consider the documentation
> (emphasis mine).
>
> Operations on AbstractArray objects are defined using higher level
>> operators and functions, *in a way that is independent of the underlying
>> storage*. These operations generally work correctly as a fallback for
>> any specific array implementation... The AbstractArray type includes
>> anything vaguely array-like, and implementations of it might be quite
>> different from conventional arrays.
>
>
> Given the intent to allow AbstractArray to be extremely flexible, some of
> the default implementations seem unnecessarily specialized and potentially
> very inefficient. For instance, in theory, hcat and vcat are just
> cat(2,...) and cat(1,...), so an AbstractArray implementation could be
> quite short for these methods. But hcat has the following specializations:
>
> function hcat{T}(V::AbstractVector{T}...)
> function vcat{T}(V::AbstractVector{T}...)
> function hcat{T}(A::AbstractVecOrMat{T}...)
> function vcat{T}(A::AbstractMatrix{T}...)
>
> I can *totally* understand why these specializations are worthwhile for
> DenseArray or Array objects. For instance, in the first three of these
> specalizations, you can take advantage of the fact that, when using Fortran
> ordering, the input matrices will be laid out *linearly and consecutively
> *in the output matrix. And indeed, if you look at the implementation of
> these methods, the implementations are taking advantage of this fact.
>
> But what happens if the concrete matrix uses C ordering, or perhaps some
> sort of novel/compressed/distributed storage method? These implementations
> be suboptimal---and not just because of the layout assumption, but also
> because they use straight for-loops or list comprehensions to move elements
> *one
> element at a time. *Again, for Array objects, those loops are going to be
> nice and tight when compiled down. But consider how inefficient that is for
> sparse matrices, for instance.
>
> Of course, there are a couple of potential solutions for the creator of a
> concrete array class. One is to study abstractarray.jl carefully and
> override every single specialized method provided. Of course, this
> increases the work required of the type designer, and is somewhat counter
> to the purpose of having a truly abstract class. And this issue is made
> more complex by the challenge of what to do about *heterogeneous *mixtures
> of abstract arrays? For instance, what to do about concatenations of sparse
> and dense arrays? In this case, I would think that the sparse array class
> should be able to decide for itself whether to emit a dense or a sparse
> array, depending on the total sparsity level of the input. That's going to
> be made more difficult by having to override multiple methods; I'm finding
> that even the invoke approach is not covering all the cases.
>
> I would suggest, therefore, that the right solution in cases like this is:
>
> - Move these "specialized" methods up to the DenseArray or Array
> level, truly keeping AbstractArray as light as possible.
> - Reimplement the AbstractArray methods that remain to *avoid
> *single-element
> moves, preferring block moves, in case the concrete types have a faster
> way
> to handle those.
> - Find an elegant solution to the heterogeneous array dispatch
> problem.
>
> On this last point, perhaps a promote_array hierarchy is in order here.
> Consider the implementation of cat in AbstractArray instance:
>
> cat(catdim::Integer, A::AbstractArray...) = cat_t(catdim, promote_eltype(A
> ...), A...)
>
> I like this: it doesn't do the *conversion *yet, but at least it
> establishes what the eltype should be. Well, what if, instead of
> promote_eltype, there was some sort of promote_array method that abstract
> array classes could override. The purpose of this method would be to
> determine not just the final eltype but the final *concrete* array class
> as well. New array classes could override promote_array as needed. And if
> the second argument of cat_t were the output of promote_array, subclasses
> could override it and provide their own storage-optimized implementation
> without having to reimplement all the dispatch.
>
> Thoughts are welcome. If this looks promising I could even come up with a
> prototype of promote_array.
>