Thank you for the link and the explanation, John -- it's definitely helpful. Is current work with Nullable and data structures available anywhere in JuliaStats, or is it being developed elsewhere?
On Saturday, May 30, 2015 at 12:23:09 PM UTC-4, John Myles White wrote: > > David, > > To clarify your understanding of what's wrong with DataArrays, check out > the DataArray code for something like getindex(): > https://github.com/JuliaStats/DataArrays.jl/blob/master/src/indexing.jl#L109 > <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FJuliaStats%2FDataArrays.jl%2Fblob%2Fmaster%2Fsrc%2Findexing.jl%23L109&sa=D&sntz=1&usg=AFQjCNHy0P-zaAlH7SKIUtbSOUgb1zfpcw> > > I don't have a full understanding of Julia's type inference system, but > here's my best attempt to explain my current understanding of the system > and how it affects Seth's original example. > > Consider two simple functions, f and g, and their application inside a > larger function, gf(): > > # Given pre-existing definitions such that: > # > # f(input::R) => output::S > # g(input::S) => output::T > # > # What can we infer about the following larger function? > function gf(x::Any) > return g(f(x)) > end > > The important questions to ask are about what we can infer at > method-compile-time for gf(). Specifically, ask: > > (1) Can we determine the type S given the type R, which is currently bound > to the type of the specific value of x on which we called gf()? (Note that > it was the act of calling gf(x) on a specific value that triggered the > entire method-compilation process.) > > (2) Can we determine that the type S is a specific concrete type? > Concreteness matters, because we're going to have to think about how the > output of f() affects the input of g(). In particular, we need to know > whether we need to perform run-time dispatch inside of gf() or whether all > dispatch inside of gf() can be determined statically given the type of > gf()'s argument x. > > (3) Assuming that we successfully determined a concrete type S given R, > can we repeat the process for g() to yield a concrete type for T? If so, > then we'll be able to infer, at least for one specific type R, the concrete > output type of gf(x). If not, we'll have to give looser bounds on the > concrete types that come out of gf() given an input of a specific value > like our current x. That would be important if we were going to call gf() > inside another function. > > Hope that helps. > > -- John > > On Saturday, May 30, 2015 at 4:51:09 AM UTC-7, David Gold wrote: >> >> @Steven, >> >> Would you help me to understand the difference between this case here and >> the case of DataArray{T}s -- which, by my understanding, are basically >> AbstractArray{Union{T, NaN}, 1}'s? My first thought was that taking a >> Union{Bool, AbstractArray{Float, 2}} argument would potentially interfere >> with the compiler's ability to perform type inference, similar to how >> looping through a DataArray can experience a cost from the compiler having >> to deal with possible NaNs. >> >> But what you're saying is that this does not apply here, since presumably >> the argument, whether it is a Bool or an AbstractArray, would be >> type-stable throughout the functions operations -- unlike the values >> contained in a DataArray. Would it be fair to say that dealing with Union{} >> types tends to be dangerous to performance mostly when they are looped over >> in some sort of container, since in that case it's not a matter of simply >> dispatching a specially compiled method on one of the conjunct types or the >> other? >> >> On Friday, May 29, 2015 at 9:49:45 PM UTC-4, Steven G. Johnson wrote: >>> >>> *No!* This is one of the most common misconceptions about Julia >>> programming. >>> >>> The type declarations in function arguments have *no impact* on >>> performance. Zero. Nada. Zip. You *don't have to declare a type at >>> all* in the function argument, and it *still* won't matter for >>> performance. >>> >>> The argument types are just a filter for when the function is applicable. >>> >>> The first time a function is called, a specialized version is compiled >>> for the types of the arguments that you pass it. Subsequently, when you >>> call it with arguments of the same type, the specialized version is called. >>> >>> Note also that a default argument foo(x, y=false) is exactly equivalent >>> to defining >>> >>> foo(x,y) = ... >>> foo(x) = foo(x, false) >>> >>> So, if you call foo(x, [1,2,3]), it calls a version of foo(x,y) >>> specialized for an Array{Int} in the second argument. The existence of a >>> version of foo specialized for a boolean y is irrelevant. >>> >>
