A couple of points that expand on Tom’s comments:

(1) We need to add Tom’s definition of countna(a::Array) = 0 to show() wide 
DataFrame’s that contain any columns that are Vector’s. I never use DataFrame’s 
like that, so I forgot that others might. It’s also impossible to produce such 
a DataFrame using our current I/O routines.

(2) The constructor you’re using does exist, Jacob, but you should typically 
pass in a Vector{Any}, each element of which is either a DataVector or 
PooledDataVector. See Point (3) for why, at the moment, using a Vector as a 
column is subtly broken.

(3) If people are going to put Vector’s in DataFrames for performance reasons, 
all of our setindex!() functions for DataFrames need to add methods that 
automatically convert Vector’s to DataVector’s if an NA is inserted in a 
Vector. Right now that kind of insertion is just going to error out. Ths check 
isn’t too hard, but it’s totally missing from our current codebase.

Personally, I would prefer that we not allow any of the columns of a DataFrame 
to be Vector's. It’s a weird edge case that doesn’t actually offer reliable 
high performance, because the potential performance improvements relies on the 
unsafe assumption that a DataFame won’t contain any columns with NA’s in it.

 — John

On Jan 23, 2014, at 1:33 PM, Tom Short <[email protected]> wrote:

> That works, but columns will be Arrays instead of DataArrays. That's
> the way it's always worked. If you want them to be DataArrays, then
> convert to DataArrays right at the end.
> 
> To fix show to support columns that are arrays, we probably need (at
> least) to define the following:
> 
> countna(da::Array) = 0
> 
> 
> 
> On Thu, Jan 23, 2014 at 4:07 PM, Jacob Quinn <[email protected]> wrote:
>> Great investigative work. Is
>> DataFrames( array_of_arrays, Index(column_names_array) )
>> not the right way to hand construct DataFrames any more? I think I can
>> allocate DataArrays instead, but at every step of the way, I was trying to
>> hand-optimize the result fetching process, which resulted in not creating a
>> DataArray or DataFrame until right before we return to the user.
>> 
>> -Jacob
>> 
>> 
>> On Thu, Jan 23, 2014 at 3:27 PM, bp2012 <[email protected]> wrote:
>>> 
>>> To check Jacob's suggestion about versions mismatch I completely removed
>>> the DataFrames and ODBC packages using Pkg.rm and physically deleted the
>>> directories from disk. I then added them via Pkg.add and Pkg,update.
>>> 
>>> I am running the julia nightlies build.
>>> julia> versioninfo()
>>> Julia Version 0.3.0-prerelease+1127
>>> Commit bc73674* (2014-01-22 20:09 UTC)
>>> 
>>> Pkg.status()
>>> - DataFrames                  0.5.1
>>> - ODBC                          0.3.5
>>> 
>>> Pkg.checkout("ODBC")
>>> INFO: Checking out ODBC master...
>>> INFO: Pulling ODBC latest master...
>>> INFO: No packages to install, update or remove
>>> 
>>> julia> Pkg.checkout("DataFrames")
>>> INFO: Checking out DataFrames master...
>>> INFO: Pulling DataFrames latest master...
>>> INFO: No packages to install, update or remove
>>> 
>>> I did some digging. It looks like there is a mismatch in that countna
>>> expects DataFrame columns to be DataArrays. However the ODBC package returns
>>> DataFrames that have array columns (using the first constructor in
>>> dataframe.jl). You guys would know better as to whether a change is needed
>>> in the constructor or if countna should also accept Array columns.
>>> 
>>> 
>>> I made some local changes to work around the issue.
>>> 
>>> show.jl:
>>> line 42:  if isna(col, i)     changed to  if isna(col[i])
>>> line 322:  missing[j] = countna(adf[j])     changed to    missing[j] =
>>> countna(isa(adf[j], DataArray) ? adf[j] : DataArray(adf[j]))
>>> 
>>> These work great for me.
>>> 
>> 

Reply via email to