[julia-users] Re: Does union() imply worse performance?

John Myles White Sat, 30 May 2015 12:40:13 -0700

The NullableArrays work is very far behind schedule. I developed RSI right 
after announcing the work on NullableArrays and am still recovering, which 
means that I can spend very little time working on Julia code these days.


I'll give you more details offline.

 -- John

On Saturday, May 30, 2015 at 10:48:10 AM UTC-7, David Gold wrote:
>
> Thank you for the link and the explanation, John -- it's definitely 
> helpful. Is current work with Nullable and data structures available 
> anywhere in JuliaStats, or is it being developed elsewhere?
>
> On Saturday, May 30, 2015 at 12:23:09 PM UTC-4, John Myles White wrote:
>>
>> David,
>>
>> To clarify your understanding of what's wrong with DataArrays, check out 
>> the DataArray code for something like getindex(): 
>> https://github.com/JuliaStats/DataArrays.jl/blob/master/src/indexing.jl#L109 
>> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FJuliaStats%2FDataArrays.jl%2Fblob%2Fmaster%2Fsrc%2Findexing.jl%23L109&sa=D&sntz=1&usg=AFQjCNHy0P-zaAlH7SKIUtbSOUgb1zfpcw>
>>
>> I don't have a full understanding of Julia's type inference system, but 
>> here's my best attempt to explain my current understanding of the system 
>> and how it affects Seth's original example.
>>
>> Consider two simple functions, f and g, and their application inside a 
>> larger function, gf():
>>
>> # Given pre-existing definitions such that:
>> #
>> # f(input::R) => output::S
>> # g(input::S) => output::T
>> #
>> # What can we infer about the following larger function?
>> function gf(x::Any)
>>     return g(f(x))
>> end
>>
>> The important questions to ask are about what we can infer at 
>> method-compile-time for gf(). Specifically, ask:
>>
>> (1) Can we determine the type S given the type R, which is currently 
>> bound to the type of the specific value of x on which we called gf()? (Note 
>> that it was the act of calling gf(x) on a specific value that triggered the 
>> entire method-compilation process.)
>>
>> (2) Can we determine that the type S is a specific concrete type? 
>> Concreteness matters, because we're going to have to think about how the 
>> output of f() affects the input of g(). In particular, we need to know 
>> whether we need to perform run-time dispatch inside of gf() or whether all 
>> dispatch inside of gf() can be determined statically given the type of 
>> gf()'s argument x.
>>
>> (3) Assuming that we successfully determined a concrete type S given R, 
>> can we repeat the process for g() to yield a concrete type for T? If so, 
>> then we'll be able to infer, at least for one specific type R, the concrete 
>> output type of gf(x). If not, we'll have to give looser bounds on the 
>> concrete types that come out of gf() given an input of a specific value 
>> like our current x. That would be important if we were going to call gf() 
>> inside another function.
>>
>> Hope that helps.
>>
>>  -- John
>>
>> On Saturday, May 30, 2015 at 4:51:09 AM UTC-7, David Gold wrote:
>>>
>>> @Steven,
>>>
>>> Would you help me to understand the difference between this case here 
>>> and the case of DataArray{T}s -- which, by my understanding, are basically 
>>> AbstractArray{Union{T, NaN}, 1}'s? My first thought was that taking a 
>>> Union{Bool, AbstractArray{Float, 2}} argument would potentially interfere 
>>> with the compiler's ability to perform type inference, similar to how 
>>> looping through a DataArray can experience a cost from the compiler having 
>>> to deal with possible NaNs. 
>>>
>>> But what you're saying is that this does not apply here, since 
>>> presumably the argument, whether it is a Bool or an AbstractArray, would be 
>>> type-stable throughout the functions operations -- unlike the values 
>>> contained in a DataArray. Would it be fair to say that dealing with Union{} 
>>> types tends to be dangerous to performance mostly when they are looped over 
>>> in some sort of container, since in that case it's not a matter of simply 
>>> dispatching a specially compiled method on one of the conjunct types or the 
>>> other?
>>>
>>> On Friday, May 29, 2015 at 9:49:45 PM UTC-4, Steven G. Johnson wrote:
>>>>
>>>> *No!*  This is one of the most common misconceptions about Julia 
>>>> programming.
>>>>
>>>> The type declarations in function arguments have *no impact* on 
>>>> performance.  Zero.  Nada.  Zip.  You *don't have to declare a type at 
>>>> all* in the function argument, and it *still* won't matter for 
>>>> performance.
>>>>
>>>> The argument types are just a filter for when the function is 
>>>> applicable.
>>>>
>>>> The first time a function is called, a specialized version is compiled 
>>>> for the types of the arguments that you pass it.  Subsequently, when you 
>>>> call it with arguments of the same type, the specialized version is called.
>>>>
>>>> Note also that a default argument foo(x, y=false) is exactly equivalent 
>>>> to defining
>>>>
>>>>     foo(x,y) = ...
>>>>     foo(x) = foo(x, false)
>>>>
>>>> So, if you call foo(x, [1,2,3]), it calls a version of foo(x,y) 
>>>> specialized for an Array{Int} in the second argument.  The existence of a 
>>>> version of foo specialized for a boolean y is irrelevant.
>>>>
>>>

[julia-users] Re: Does union() imply worse performance?

Reply via email to