[julia-users] Array/Cell - a useful distinction, or not?

Oliver Woodford Tue, 29 Apr 2014 07:34:08 -0700

A habitual MATLAB user, I've been trying out Julia over the last two weeks 
to see if it might be a suitable replacement. I want something that is as 
fast to develop using, but has much faster runtime. Perhaps I'll write 
about my general thoughts in another post. However, in this thread I want 
to address one linguistic thing I found confusing.


Ignoring subarrays, dense/sparse arrays, there are two main types of array 
in Julia. I will call them homogenous and heterogenous. Homogenous arrays 
are declared as having all elements be the same type: e.g. array{Float64}. 
They are efficient to store in memory, as the elements are simply laid out 
consecutively in memory. Heterogenous arrays have an abstract element type, 
e.g. array{Real}. The way Julia interprets this is that every element must 
be a concrete subtype of Real, but that they don't have to be the same 
type. Each element can therefore be a different type, with different 
storage requirements, so these arrays contain a pointer to each element, 
which is then stored somewhere else - this carries a massive overhead. In 
MATLAB these arrays would be termed an array and a cell array respectively, 
so there is a clear distinction. What I found confusing with Julia is that 
the distinction is less clear.

This confusion was highlighted in a stackoverflow 
question<http://stackoverflow.com/questions/23326848/julia-arrays-with-abstract-parameters-cause-errors-but-variables-with-abstract>,
 
which I'll outline it again, now:

f(x::Real) = x is equivalent to f{T<:Real}(x::T) = x, but f(x::Array{Real}) 
= x is different from f{T<:Real}(x::Array{T}) = x.

The second form for input arrays, requiring static parameters, is needed to 
declare that the array is homogenous, not heterogenous. This seems a funny 
way of doing things to me because:
1. The homogeneity/heterogeneity of the array is a characteristic of the 
array only, and not of the function
2. The static parameter T is not required anywhere else, and the Julia 
style 
guide<http://julia.readthedocs.org/en/latest/manual/style-guide/#don-t-use-unnecessary-static-parameters>
 explicitly 
counsels against the use of such parameters, where they are unnecessary.
3. To declare a function which can take homogenous or heterogenous arrays, 
I believe you'd have to do something like  f{T<:Real}(x::Union(Array{T}, 
Array{Real})) = x, which seems totally bizarre (due to point 1).

What I would advocate instead is two types of array, one homogenous, one 
heterogenous. Array for homogenous and Cell for heterogenous would work. It 
would do away with the need for static parameters in this case, and also, 
in my view, make people far more aware of when they are using the different 
types of array. I suspect many beginners are oblivious to the distinction, 
currently.

In the stackoverflow question, someone suggested two points against this:
1. Having an array whose elements are all guaranteed to be some subtype of 
Real is not particularly useful without specifying which subtype since 
without that information almost no structural information is being provided 
to the compiler (e.g. about memory layout, etc.)
Well, firstly I disagree; there is a lot of structural information being 
supplied - to read each element, the compiler knows that it just needs to 
compute an offset, rather than compute an offset, read a pointer and then 
read another memory location. However, I don't think this is exploited 
(though it could be) because the function will be recompiled from scratch 
for each element type. Secondly, this isn't about helping the compiler, 
it's about making the language more consistent and sensible - helping the 
*user*.
2. You almost always pass homogenous arrays of a concrete type as arguments 
anyway and the compiler is able to specialize on that.
Firstly, homogenous arrays that you pass in *always *have a concrete type. 
Secondly, you don't always know what that type will be. It might be Float64 
or Unit8, etc.

I haven't yet heard a convincing counterargument to it making more sense to 
distinguish homogenous and heterogenous arrays by the array type rather 
than by static function parameter.

Let the discussion begin...

[julia-users] Array/Cell - a useful distinction, or not?

Reply via email to