[julia-users] Re: Does union() imply worse performance?

John Myles White Sat, 30 May 2015 09:24:12 -0700

David,

To clarify your understanding of what's wrong with DataArrays, check out 
the DataArray code for something like 
getindex(): 
https://github.com/JuliaStats/DataArrays.jl/blob/master/src/indexing.jl#L109

I don't have a full understanding of Julia's type inference system, but 
here's my best attempt to explain my current understanding of the system 
and how it affects Seth's original example.

Consider two simple functions, f and g, and their application inside a 
larger function, gf():

# Given pre-existing definitions such that:
#
# f(input::R) => output::S
# g(input::S) => output::T
#
# What can we infer about the following larger function?
function gf(x::Any)
    return g(f(x))
end

The important questions to ask are about what we can infer at 
method-compile-time for gf(). Specifically, ask:

(1) Can we determine the type S given the type R, which is currently bound 
to the type of the specific value of x on which we called gf()? (Note that 
it was the act of calling gf(x) on a specific value that triggered the 
entire method-compilation process.)

(2) Can we determine that the type S is a specific concrete type? 
Concreteness matters, because we're going to have to think about how the 
output of f() affects the input of g(). In particular, we need to know 
whether we need to perform run-time dispatch inside of gf() or whether all 
dispatch inside of gf() can be determined statically given the type of 
gf()'s argument x.

(3) Assuming that we successfully determined a concrete type S given R, can 
we repeat the process for g() to yield a concrete type for T? If so, then 
we'll be able to infer, at least for one specific type R, the concrete 
output type of gf(x). If not, we'll have to give looser bounds on the 
concrete types that come out of gf() given an input of a specific value 
like our current x. That would be important if we were going to call gf() 
inside another function.

Hope that helps.

 -- John

On Saturday, May 30, 2015 at 4:51:09 AM UTC-7, David Gold wrote:
>
> @Steven,
>
> Would you help me to understand the difference between this case here and 
> the case of DataArray{T}s -- which, by my understanding, are basically 
> AbstractArray{Union{T, NaN}, 1}'s? My first thought was that taking a 
> Union{Bool, AbstractArray{Float, 2}} argument would potentially interfere 
> with the compiler's ability to perform type inference, similar to how 
> looping through a DataArray can experience a cost from the compiler having 
> to deal with possible NaNs. 
>
> But what you're saying is that this does not apply here, since presumably 
> the argument, whether it is a Bool or an AbstractArray, would be 
> type-stable throughout the functions operations -- unlike the values 
> contained in a DataArray. Would it be fair to say that dealing with Union{} 
> types tends to be dangerous to performance mostly when they are looped over 
> in some sort of container, since in that case it's not a matter of simply 
> dispatching a specially compiled method on one of the conjunct types or the 
> other?
>
> On Friday, May 29, 2015 at 9:49:45 PM UTC-4, Steven G. Johnson wrote:
>>
>> *No!*  This is one of the most common misconceptions about Julia 
>> programming.
>>
>> The type declarations in function arguments have *no impact* on 
>> performance.  Zero.  Nada.  Zip.  You *don't have to declare a type at 
>> all* in the function argument, and it *still* won't matter for 
>> performance.
>>
>> The argument types are just a filter for when the function is applicable.
>>
>> The first time a function is called, a specialized version is compiled 
>> for the types of the arguments that you pass it.  Subsequently, when you 
>> call it with arguments of the same type, the specialized version is called.
>>
>> Note also that a default argument foo(x, y=false) is exactly equivalent 
>> to defining
>>
>>     foo(x,y) = ...
>>     foo(x) = foo(x, false)
>>
>> So, if you call foo(x, [1,2,3]), it calls a version of foo(x,y) 
>> specialized for an Array{Int} in the second argument.  The existence of a 
>> version of foo specialized for a boolean y is irrelevant.
>>
>

[julia-users] Re: Does union() imply worse performance?

Reply via email to