[julia-users] Re: Array/Cell - a useful distinction, or not?

Matt Bauman Tue, 29 Apr 2014 08:53:52 -0700

I use cell arrays very often in Matlab, too, but I've found that I often 
don't really need to even worry about the distinction in julia.  Square 
brackets will constrain the types as much as possible, and if it's not 
possible, Any[] == {}.


Moreover, most of what I used cell arrays for in Matlab are completely 
obviated in Julia — Cell arrays of strings (strings are first class) and 
passing/parsing/splatting varargs (keyword arguments are wonderful and any 
Julian collection can be splatted).

On Tuesday, April 29, 2014 11:28:28 AM UTC-4, Oliver Woodford wrote:
>
>
> On Tuesday, April 29, 2014 3:48:52 PM UTC+1, Ivar Nesje wrote:
>>
>> Sorry for nitpicking, but point 3 is wrong, and it might cause trouble in 
>> the following discussion.
>>
>> f{T<:Real}(a::Array{T})
>> Matches any array with a element type that is a subtype of Real (eg. 
>> Integer[1,2,BigInt(44)] and Real[1, 3.4])
>>
>  
> Are you saying that f{T<:Real}(a::Array{T}) covers both homogeneous and 
> heterogeneous arrays, whereas f(a::Array{Real}) only covers heterogeneous 
> arrays? If that's the case it strikes me as just as confusing.
>
>
>> I have had trouble with this too, but now that I somewhat understand the 
>> rationale, I'm less frustrated. I'm not at the point of defending the 
>> current behaviour (yet), so others will have to do that (again).
>>
>
> If this has already been discussed then by all means post a link to it. No 
> point repeating things!
>  
>
>>
>> kl. 11:38:33 UTC+2 tirsdag 29. april 2014 skrev Oliver Woodford følgende:
>>>
>>> A habitual MATLAB user, I've been trying out Julia over the last two 
>>> weeks to see if it might be a suitable replacement. I want something that 
>>> is as fast to develop using, but has much faster runtime. Perhaps I'll 
>>> write about my general thoughts in another post. However, in this thread I 
>>> want to address one linguistic thing I found confusing.
>>>
>>> Ignoring subarrays, dense/sparse arrays, there are two main types of 
>>> array in Julia. I will call them homogenous and heterogenous. Homogenous 
>>> arrays are declared as having all elements be the same type: e.g. 
>>> array{Float64}. They are efficient to store in memory, as the elements are 
>>> simply laid out consecutively in memory. Heterogenous arrays have an 
>>> abstract element type, e.g. array{Real}. The way Julia interprets this is 
>>> that every element must be a concrete subtype of Real, but that they don't 
>>> have to be the same type. Each element can therefore be a different type, 
>>> with different storage requirements, so these arrays contain a pointer to 
>>> each element, which is then stored somewhere else - this carries a massive 
>>> overhead. In MATLAB these arrays would be termed an array and a cell array 
>>> respectively, so there is a clear distinction. What I found confusing with 
>>> Julia is that the distinction is less clear.
>>>
>>> This confusion was highlighted in a stackoverflow 
>>> question<http://stackoverflow.com/questions/23326848/julia-arrays-with-abstract-parameters-cause-errors-but-variables-with-abstract>,
>>>  
>>> which I'll outline it again, now:
>>>
>>> f(x::Real) = x is equivalent to f{T<:Real}(x::T) = x, but f(x::Array{Real}) 
>>> = x is different from f{T<:Real}(x::Array{T}) = x.
>>>
>>> The second form for input arrays, requiring static parameters, is needed 
>>> to declare that the array is homogenous, not heterogenous. This seems a 
>>> funny way of doing things to me because:
>>> 1. The homogeneity/heterogeneity of the array is a characteristic of the 
>>> array only, and not of the function
>>> 2. The static parameter T is not required anywhere else, and the Julia 
>>> style 
>>> guide<http://julia.readthedocs.org/en/latest/manual/style-guide/#don-t-use-unnecessary-static-parameters>
>>>  explicitly 
>>> counsels against the use of such parameters, where they are unnecessary.
>>> 3. To declare a function which can take homogenous or heterogenous 
>>> arrays, I believe you'd have to do something like  
>>> f{T<:Real}(x::Union(Array{T}, 
>>> Array{Real})) = x, which seems totally bizarre (due to point 1).
>>>
>>> What I would advocate instead is two types of array, one homogenous, one 
>>> heterogenous. Array for homogenous and Cell for heterogenous would 
>>> work. It would do away with the need for static parameters in this case, 
>>> and also, in my view, make people far more aware of when they are using the 
>>> different types of array. I suspect many beginners are oblivious to the 
>>> distinction, currently.
>>>
>>> In the stackoverflow question, someone suggested two points against this:
>>> 1. Having an array whose elements are all guaranteed to be some subtype 
>>> of Real is not particularly useful without specifying which subtype since 
>>> without that information almost no structural information is being provided 
>>> to the compiler (e.g. about memory layout, etc.)
>>> Well, firstly I disagree; there is a lot of structural information being 
>>> supplied - to read each element, the compiler knows that it just needs to 
>>> compute an offset, rather than compute an offset, read a pointer and then 
>>> read another memory location. However, I don't think this is exploited 
>>> (though it could be) because the function will be recompiled from scratch 
>>> for each element type. Secondly, this isn't about helping the compiler, 
>>> it's about making the language more consistent and sensible - helping the 
>>> *user*.
>>> 2. You almost always pass homogenous arrays of a concrete type as 
>>> arguments anyway and the compiler is able to specialize on that.
>>> Firstly, homogenous arrays that you pass in *always *have a concrete 
>>> type. Secondly, you don't always know what that type will be. It might be 
>>> Float64 or Unit8, etc.
>>>
>>> I haven't yet heard a convincing counterargument to it making more sense 
>>> to distinguish homogenous and heterogenous arrays by the array type rather 
>>> than by static function parameter.
>>>
>>> Let the discussion begin...
>>>
>>

[julia-users] Re: Array/Cell - a useful distinction, or not?

Reply via email to