Would be good to clean this up by removing some of the slow parts (map usage, anonymous function usage) and have it submitted as a PR.
— John On Jun 9, 2014, at 1:17 PM, Keith Campbell <[email protected]> wrote: > Thanks for putting this togehter. > Under 0.3 pre from yesterday, I get a deprecation warning in the Array > version where df2 is assigned. The tweak below appears to resolve that > warning: > > function push!(df::DataFrame, arr::Array) > K = length(arr) > assert(size(df,2)==K) > col_types = map(eltype, eachcol(df)) > converted = map(i -> convert(col_types[i][1], arr[i]), 1:K) > ## To do: throw error if convert fails > df2 = convert( DataFrame, reshape(converted, 1, K) ) # <==tweaked > names!(df2, names(df)) > append!(df,df2) > end > > On Monday, June 9, 2014 3:44:28 PM UTC-4, Gustavo Lacerda wrote: > I've implemented this: > > function push!(df::DataFrame, arr::Array) > K = length(arr) > assert(size(df,2)==K) > col_types = map(eltype, eachcol(df)) > converted = map(i -> convert(col_types[i][1], arr[i]), 1:K) > ## To do: throw error if convert fails > df2 = DataFrame(reshape(converted, 1, K)) > names!(df2, names(df)) > append!(df,df2) > end > > X1 = rand(Normal(0,1), 10); X2 = rand(Normal(0,1), 10); X3 = > rand(Normal(0,1), 10); Y = X1 - X2 + rand(Normal(0,1), 10) > df = DataFrame(Y=Y, X1=X1, X2=X2, X3=X3) > push!(df, [1,2,3,4]) > > > I tried to generalize it by replacing Array with Tuple. > > > function push!(df::DataFrame, tup::Tuple) > K = length(tup) > assert(size(df,2)==K) > col_types = map(eltype, eachcol(df)) > converted = map(i -> convert(col_types[i][1], tup[i]), 1:K) > ## To do: throw error if convert fails > df2 = DataFrame(reshape(converted, 1, K)) > names!(df2, names(df)) > append!(df,df2) > end > > julia> df[:greeting] = "hello" > "hello" > > julia> df > 11x5 DataFrame > |-------|-----------|-------------|-----------|------------|----------| > | Row # | Y | X1 | X2 | X3 | greeting | > | 1 | 0.39624 | 0.163897 | -0.146526 | 0.592489 | "hello" | > | 2 | -0.236239 | -1.81627 | -0.726978 | 0.638524 | "hello" | > | 3 | -0.801656 | 0.000801096 | 0.543645 | -0.997613 | "hello" | > | 4 | -0.30888 | -0.166953 | 0.640827 | 1.53217 | "hello" | > | 5 | -0.662719 | -1.38129 | -0.194937 | 0.928446 | "hello" | > | 6 | 4.37102 | 2.22107 | -2.15648 | -0.703392 | "hello" | > | 7 | 0.0866397 | -0.633333 | -0.745456 | -0.0144429 | "hello" | > | 8 | 0.581942 | 1.24061 | -0.867256 | 0.283671 | "hello" | > | 9 | -3.15614 | -1.39045 | 1.34395 | 0.343224 | "hello" | > | 10 | -1.67029 | 0.634846 | 2.08062 | -0.845479 | "hello" | > | 11 | 1.0 | 2.0 | 3.0 | 4.0 | "hello" | > > > But then this happens: > > julia> push!(df, (1,2,3,4, "hi")) > ERROR: no method convert(Type{Float64}, ASCIIString) > in setindex! at array.jl:305 > in map_range_to! at range.jl:523 > in map at range.jl:534 > in push! at none:5 > > > It apparently tries to convert "hi" to Float64, even though the 5th type is > ASCIIString: > > julia> col_types > 1x5 DataFrame > |-------|---------|---------|---------|---------|-------------| > | Row # | Y | X1 | X2 | X3 | label | > | 1 | Float64 | Float64 | Float64 | Float64 | ASCIIString | > > > Gustavo > > P.S. Should the code go here? > https://github.com/JuliaStats/DataFrames.jl/blob/master/src/dataframe/dataframe.jl > > > > On Friday, June 6, 2014 5:16:11 PM UTC-4, John Myles White wrote: > You're right: any iterable could work. > > Personally, I tend to minimize the use of functionality that depends upon the > columns of a DataFrame being in a specific order. It's certainly useful in > many cases, so we can't get rid of it. But I'm not excited about people > writing a lot more code that depends upon order than they do now. > > -- John > > On Jun 6, 2014, at 1:07 PM, Ivar Nesje <[email protected]> wrote: > >> Why can't any iterable (of the correct length) be accepted? >> >> As long as the DataFrame have predefined types on the columns, it is just a >> matter of asserting or converting the type and copy it inn. Convert would >> probably be slower because the types would be unknown and it would have to >> dispatch dynamically to the right convert method. >> >> kl. 18:58:51 UTC+2 fredag 6. juni 2014 skrev John Myles White følgende: >> Yeah, I just dislike the gratuituous multiplicity of ways to do the same >> thing. >> >> -- John >> >> On Jun 6, 2014, at 9:55 AM, Stefan Karpinski <[email protected]> wrote: >> >>> Since all three can be indexed the same way, it seems like that should be a >>> minimal annoyance, no? >>> >>> On Friday, June 6, 2014, John Myles White <[email protected]> wrote: >>> The thing that annoys me about arrays is that we arguably need to accept >>> both vectors and 1-row matrices as inputs. >>> >>> -- John >>> >>> On Jun 6, 2014, at 9:20 AM, Stefan Karpinski <[email protected]> wrote: >>> >>>> See also https://github.com/JuliaStats/DataFrames.jl/issues/585. Using a >>>> tuple may make more sense, but it probably wouldn't hurt to allow an array >>>> as well. >>>> >>>> On Friday, June 6, 2014, John Myles White <[email protected]> wrote: >>>> If someone wants to submit a PR to allow adding a tuple as a row to a >>>> DataFrame, I’ll merge it. >>>> >>>> — John >>>> >>>> On May 28, 2014, at 7:43 AM, John Myles White <[email protected]> >>>> wrote: >>>> >>>>> I’m happy with using tuples since that will make it easier to construct >>>>> DataFrames from iterators. >>>>> >>>>> — John >>>>> >>>>> On May 27, 2014, at 11:37 PM, Tomas Lycken <[email protected]> wrote: >>>>> >>>>>> I like it - but maybe that wasn't so hard to guess I would ;) >>>>>> >>>>>> // T >>>>>> >>>>>> On Tuesday, May 27, 2014 10:11:15 PM UTC+2, Jacques Rioux wrote: >>>>>> Let me add a thought here. I also think that adding a row to a dataframe >>>>>> should be easier. However, I do not think that an array would be the >>>>>> best container to represent a row because array members must all be of >>>>>> the same type which brings up Any as the only options in your example. >>>>>> >>>>>> I think that appending or pushing a tuple with the right types could be >>>>>> made to work. >>>>>> >>>>>> So it would be >>>>>> >>>>>> julia> push!(psispread, (1.0,0.1,:Fake)) >>>>>> >>>>>> or >>>>>> >>>>>> julia> append!(psispread, (1.0,0.1,:Fake)) >>>>>> >>>>>> since >>>>>> >>>>>> julia> typeof((1.0, 0.1, :fake)) >>>>>> (Float64,Float64,Symbol) >>>>>> >>>>>> Note, I am not saying that this works now but that it could be made to >>>>>> work by adding the corresponding method to either function. It seems it >>>>>> is the right construct. >>>>>> >>>>>> Any thoughts? >>>>> >>>> >>> >> >
