Hey Gustavo,
Below is a crack at a version that handles tuples and deals with some of
the issues John raised. You can see some simple tests at
http://nbviewer.ipython.org/gist/catawbasam/003743259cf0a6ec968d.
If you're interested in working it over for a pull request, please feel
free. If you'd like me to do it, I'd be happy to. And if this seems like
the wrong approach, that's fine too.
cheers,
Keith
import Base.push!
function push!(df::DataFrame, iterable)
K = length(iterable)
assert(size(df,2)==K)
i=1
for t in iterable
try
#println(i,t, typeof(t))
push!(df.columns[i], t)
catch
#clean up partial row
for j in 1:(i-1)
pop!(df.columns[j])
end
msg = "Error adding $t to column $i."
throw(ArgumentError(msg))
end
i=i+1
end
end
On Monday, June 9, 2014 11:14:24 PM UTC-4, Gustavo Lacerda wrote:
>
> OK, but first I want to make it work for heterogenous lists (tuples),
> which is mysteriously failing.
>
> Gustavo
>
>
> On Monday, June 9, 2014, John Myles White <[email protected]
> <javascript:>> wrote:
> > Would be good to clean this up by removing some of the slow parts (map
> usage, anonymous function usage) and have it submitted as a PR.
> > — John
> >
> > On Jun 9, 2014, at 1:17 PM, Keith Campbell <[email protected]
> <javascript:>> wrote:
> >
> > Thanks for putting this togehter.
> > Under 0.3 pre from yesterday, I get a deprecation warning in the Array
> version where df2 is assigned. The tweak below appears to resolve that
> warning:
> > function push!(df::DataFrame, arr::Array)
> > K = length(arr)
> > assert(size(df,2)==K)
> > col_types = map(eltype, eachcol(df))
> > converted = map(i -> convert(col_types[i][1], arr[i]), 1:K)
> > ## To do: throw error if convert fails
> > df2 = convert( DataFrame, reshape(converted, 1, K) ) # <==tweaked
> > names!(df2, names(df))
> > append!(df,df2)
> > end
> > On Monday, June 9, 2014 3:44:28 PM UTC-4, Gustavo Lacerda wrote:
> >
> > I've implemented this:
> >
> > function push!(df::DataFrame, arr::Array)
> > K = length(arr)
> > assert(size(df,2)==K)
> > col_types = map(eltype, eachcol(df))
> > converted = map(i -> convert(col_types[i][1], arr[i]), 1:K)
> > ## To do: throw error if convert fails
> > df2 = DataFrame(reshape(converted, 1, K))
> > names!(df2, names(df))
> > append!(df,df2)
> > end
> > X1 = rand(Normal(0,1), 10); X2 = rand(Normal(0,1), 10); X3 =
> rand(Normal(0,1), 10); Y = X1 - X2 + rand(Normal(0,1), 10)
> > df = DataFrame(Y=Y, X1=X1, X2=X2, X3=X3)
> > push!(df, [1,2,3,4])
> >
> > I tried to generalize it by replacing Array with Tuple.
> >
> > function push!(df::DataFrame, tup::Tuple)
> > K = length(tup)
> > assert(size(df,2)==K)
> > col_types = map(eltype, eachcol(df))
> > converted = map(i -> convert(col_types[i][1], tup[i]), 1:K)
> > ## To do: throw error if convert fails
> > df2 = DataFrame(reshape(converted, 1, K))
> > names!(df2, names(df))
> > append!(df,df2)
> > end
> > julia> df[:greeting] = "hello"
> > "hello"
> > julia> df
> > 11x5 DataFrame
> > |-------|-----------|-------------|-----------|------------|----------|
> > | Row # | Y | X1 | X2 | X3 | greeting |
> > | 1 | 0.39624 | 0.163897 | -0.146526 | 0.592489 | "hello" |
> > | 2 | -0.236239 | -1.81627 | -0.726978 | 0.638524 | "hello" |
> > | 3 | -0.801656 | 0.000801096 | 0.543645 | -0.997613 | "hello" |
> > | 4 | -0.30888 | -0.166953 | 0.640827 | 1.53217 | "hello" |
> > | 5 | -0.662719 | -1.38129 | -0.194937 | 0.928446 | "hello" |
> > | 6 | 4.37102 | 2.22107 | -2.15648 | -0.703392 | "hello" |
> > | 7 | 0.0866397 | -0.633333 | -0.745456 | -0.0144429 | "hello" |
> > | 8 | 0.581942 | 1.24061 | -0.867256 | 0.283671 | "hello" |
> > | 9 | -3.15614 | -1.39045 | 1.34395 | 0.343224 | "hello" |
> > | 10 | -1.67029 | 0.634846 | 2.08062 | -0.845479 | "hello" |
> > | 11 | 1.0 | 2.0 | 3.0 | 4.0 | "hello" |
> >
> > But then this happens:
> > julia> push!(df, (1,2,3,4, "hi"))
> > ERROR: no method convert(Type{Float64}, ASCIIString)
> > in setindex! at array.jl:305
> > in map_range_to! at range.jl:523
> > in map at range.jl:534
> > in push! at none:5
> >
> > It apparently tries to convert "hi" to Float64, even though the 5th type
> is ASCIIString:
> > julia> col_types
> > 1x5 DataFrame
> > |-------|---------|---------|---------|---------|-------------|
> > | Row # | Y | X1 | X2 | X3 | label |
> > | 1 | Float64 | Float64 | Float64 | Float64 | ASCIIString |
> >
> > Gustavo
> > P.S. Should the code go here?
>
> --
> --
> Gustavo Lacerda
> http://www.optimizelife.com
>