The NullableArrays work is very far behind schedule. I developed RSI right after announcing the work on NullableArrays and am still recovering, which means that I can spend very little time working on Julia code these days.
I'll give you more details offline. -- John On Saturday, May 30, 2015 at 10:48:10 AM UTC-7, David Gold wrote: > > Thank you for the link and the explanation, John -- it's definitely > helpful. Is current work with Nullable and data structures available > anywhere in JuliaStats, or is it being developed elsewhere? > > On Saturday, May 30, 2015 at 12:23:09 PM UTC-4, John Myles White wrote: >> >> David, >> >> To clarify your understanding of what's wrong with DataArrays, check out >> the DataArray code for something like getindex(): >> https://github.com/JuliaStats/DataArrays.jl/blob/master/src/indexing.jl#L109 >> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FJuliaStats%2FDataArrays.jl%2Fblob%2Fmaster%2Fsrc%2Findexing.jl%23L109&sa=D&sntz=1&usg=AFQjCNHy0P-zaAlH7SKIUtbSOUgb1zfpcw> >> >> I don't have a full understanding of Julia's type inference system, but >> here's my best attempt to explain my current understanding of the system >> and how it affects Seth's original example. >> >> Consider two simple functions, f and g, and their application inside a >> larger function, gf(): >> >> # Given pre-existing definitions such that: >> # >> # f(input::R) => output::S >> # g(input::S) => output::T >> # >> # What can we infer about the following larger function? >> function gf(x::Any) >> return g(f(x)) >> end >> >> The important questions to ask are about what we can infer at >> method-compile-time for gf(). Specifically, ask: >> >> (1) Can we determine the type S given the type R, which is currently >> bound to the type of the specific value of x on which we called gf()? (Note >> that it was the act of calling gf(x) on a specific value that triggered the >> entire method-compilation process.) >> >> (2) Can we determine that the type S is a specific concrete type? >> Concreteness matters, because we're going to have to think about how the >> output of f() affects the input of g(). In particular, we need to know >> whether we need to perform run-time dispatch inside of gf() or whether all >> dispatch inside of gf() can be determined statically given the type of >> gf()'s argument x. >> >> (3) Assuming that we successfully determined a concrete type S given R, >> can we repeat the process for g() to yield a concrete type for T? If so, >> then we'll be able to infer, at least for one specific type R, the concrete >> output type of gf(x). If not, we'll have to give looser bounds on the >> concrete types that come out of gf() given an input of a specific value >> like our current x. That would be important if we were going to call gf() >> inside another function. >> >> Hope that helps. >> >> -- John >> >> On Saturday, May 30, 2015 at 4:51:09 AM UTC-7, David Gold wrote: >>> >>> @Steven, >>> >>> Would you help me to understand the difference between this case here >>> and the case of DataArray{T}s -- which, by my understanding, are basically >>> AbstractArray{Union{T, NaN}, 1}'s? My first thought was that taking a >>> Union{Bool, AbstractArray{Float, 2}} argument would potentially interfere >>> with the compiler's ability to perform type inference, similar to how >>> looping through a DataArray can experience a cost from the compiler having >>> to deal with possible NaNs. >>> >>> But what you're saying is that this does not apply here, since >>> presumably the argument, whether it is a Bool or an AbstractArray, would be >>> type-stable throughout the functions operations -- unlike the values >>> contained in a DataArray. Would it be fair to say that dealing with Union{} >>> types tends to be dangerous to performance mostly when they are looped over >>> in some sort of container, since in that case it's not a matter of simply >>> dispatching a specially compiled method on one of the conjunct types or the >>> other? >>> >>> On Friday, May 29, 2015 at 9:49:45 PM UTC-4, Steven G. Johnson wrote: >>>> >>>> *No!* This is one of the most common misconceptions about Julia >>>> programming. >>>> >>>> The type declarations in function arguments have *no impact* on >>>> performance. Zero. Nada. Zip. You *don't have to declare a type at >>>> all* in the function argument, and it *still* won't matter for >>>> performance. >>>> >>>> The argument types are just a filter for when the function is >>>> applicable. >>>> >>>> The first time a function is called, a specialized version is compiled >>>> for the types of the arguments that you pass it. Subsequently, when you >>>> call it with arguments of the same type, the specialized version is called. >>>> >>>> Note also that a default argument foo(x, y=false) is exactly equivalent >>>> to defining >>>> >>>> foo(x,y) = ... >>>> foo(x) = foo(x, false) >>>> >>>> So, if you call foo(x, [1,2,3]), it calls a version of foo(x,y) >>>> specialized for an Array{Int} in the second argument. The existence of a >>>> version of foo specialized for a boolean y is irrelevant. >>>> >>>