David,
To clarify your understanding of what's wrong with DataArrays, check out
the DataArray code for something like
getindex():
https://github.com/JuliaStats/DataArrays.jl/blob/master/src/indexing.jl#L109
I don't have a full understanding of Julia's type inference system, but
here's my best attempt to explain my current understanding of the system
and how it affects Seth's original example.
Consider two simple functions, f and g, and their application inside a
larger function, gf():
# Given pre-existing definitions such that:
#
# f(input::R) => output::S
# g(input::S) => output::T
#
# What can we infer about the following larger function?
function gf(x::Any)
return g(f(x))
end
The important questions to ask are about what we can infer at
method-compile-time for gf(). Specifically, ask:
(1) Can we determine the type S given the type R, which is currently bound
to the type of the specific value of x on which we called gf()? (Note that
it was the act of calling gf(x) on a specific value that triggered the
entire method-compilation process.)
(2) Can we determine that the type S is a specific concrete type?
Concreteness matters, because we're going to have to think about how the
output of f() affects the input of g(). In particular, we need to know
whether we need to perform run-time dispatch inside of gf() or whether all
dispatch inside of gf() can be determined statically given the type of
gf()'s argument x.
(3) Assuming that we successfully determined a concrete type S given R, can
we repeat the process for g() to yield a concrete type for T? If so, then
we'll be able to infer, at least for one specific type R, the concrete
output type of gf(x). If not, we'll have to give looser bounds on the
concrete types that come out of gf() given an input of a specific value
like our current x. That would be important if we were going to call gf()
inside another function.
Hope that helps.
-- John
On Saturday, May 30, 2015 at 4:51:09 AM UTC-7, David Gold wrote:
>
> @Steven,
>
> Would you help me to understand the difference between this case here and
> the case of DataArray{T}s -- which, by my understanding, are basically
> AbstractArray{Union{T, NaN}, 1}'s? My first thought was that taking a
> Union{Bool, AbstractArray{Float, 2}} argument would potentially interfere
> with the compiler's ability to perform type inference, similar to how
> looping through a DataArray can experience a cost from the compiler having
> to deal with possible NaNs.
>
> But what you're saying is that this does not apply here, since presumably
> the argument, whether it is a Bool or an AbstractArray, would be
> type-stable throughout the functions operations -- unlike the values
> contained in a DataArray. Would it be fair to say that dealing with Union{}
> types tends to be dangerous to performance mostly when they are looped over
> in some sort of container, since in that case it's not a matter of simply
> dispatching a specially compiled method on one of the conjunct types or the
> other?
>
> On Friday, May 29, 2015 at 9:49:45 PM UTC-4, Steven G. Johnson wrote:
>>
>> *No!* This is one of the most common misconceptions about Julia
>> programming.
>>
>> The type declarations in function arguments have *no impact* on
>> performance. Zero. Nada. Zip. You *don't have to declare a type at
>> all* in the function argument, and it *still* won't matter for
>> performance.
>>
>> The argument types are just a filter for when the function is applicable.
>>
>> The first time a function is called, a specialized version is compiled
>> for the types of the arguments that you pass it. Subsequently, when you
>> call it with arguments of the same type, the specialized version is called.
>>
>> Note also that a default argument foo(x, y=false) is exactly equivalent
>> to defining
>>
>> foo(x,y) = ...
>> foo(x) = foo(x, false)
>>
>> So, if you call foo(x, [1,2,3]), it calls a version of foo(x,y)
>> specialized for an Array{Int} in the second argument. The existence of a
>> version of foo specialized for a boolean y is irrelevant.
>>
>