Re: [julia-users] Adding a row to a DataFrame

John Myles White Mon, 09 Jun 2014 19:42:06 -0700

Would be good to clean this up by removing some of the slow parts (map usage, 
anonymous function usage) and have it submitted as a PR.


 — John

On Jun 9, 2014, at 1:17 PM, Keith Campbell <[email protected]> wrote:

> Thanks for putting this togehter.
> Under 0.3 pre from yesterday, I get a deprecation warning in the Array 
> version where df2 is assigned.  The tweak below appears to resolve that 
> warning:
> 
> function push!(df::DataFrame, arr::Array)
>     K = length(arr)
>     assert(size(df,2)==K)
>     col_types = map(eltype, eachcol(df))
>     converted = map(i -> convert(col_types[i][1], arr[i]), 1:K)
>     ## To do: throw error if convert fails
>     df2 = convert( DataFrame, reshape(converted, 1, K) )   # <==tweaked
>     names!(df2, names(df))
>     append!(df,df2)
> end
> 
> On Monday, June 9, 2014 3:44:28 PM UTC-4, Gustavo Lacerda wrote:
> I've implemented this:
> 
> function push!(df::DataFrame, arr::Array)
>     K = length(arr)
>     assert(size(df,2)==K)
>     col_types = map(eltype, eachcol(df))
>     converted = map(i -> convert(col_types[i][1], arr[i]), 1:K)
>     ## To do: throw error if convert fails
>     df2 = DataFrame(reshape(converted, 1, K))
>     names!(df2, names(df))
>     append!(df,df2)
> end
> 
> X1 = rand(Normal(0,1), 10); X2 = rand(Normal(0,1), 10); X3 = 
> rand(Normal(0,1), 10); Y = X1 - X2 + rand(Normal(0,1), 10)
> df = DataFrame(Y=Y, X1=X1, X2=X2, X3=X3)
> push!(df, [1,2,3,4])
> 
> 
> I tried to generalize it by replacing Array with Tuple.
> 
> 
> function push!(df::DataFrame, tup::Tuple)
>     K = length(tup)
>     assert(size(df,2)==K)
>     col_types = map(eltype, eachcol(df))
>     converted = map(i -> convert(col_types[i][1], tup[i]), 1:K)
>     ## To do: throw error if convert fails
>     df2 = DataFrame(reshape(converted, 1, K))
>     names!(df2, names(df))
>     append!(df,df2)
> end
> 
> julia> df[:greeting] = "hello"
> "hello"
> 
> julia> df
> 11x5 DataFrame
> |-------|-----------|-------------|-----------|------------|----------|
> | Row # | Y         | X1          | X2        | X3         | greeting |
> | 1     | 0.39624   | 0.163897    | -0.146526 | 0.592489   | "hello"  |
> | 2     | -0.236239 | -1.81627    | -0.726978 | 0.638524   | "hello"  |
> | 3     | -0.801656 | 0.000801096 | 0.543645  | -0.997613  | "hello"  |
> | 4     | -0.30888  | -0.166953   | 0.640827  | 1.53217    | "hello"  |
> | 5     | -0.662719 | -1.38129    | -0.194937 | 0.928446   | "hello"  |
> | 6     | 4.37102   | 2.22107     | -2.15648  | -0.703392  | "hello"  |
> | 7     | 0.0866397 | -0.633333   | -0.745456 | -0.0144429 | "hello"  |
> | 8     | 0.581942  | 1.24061     | -0.867256 | 0.283671   | "hello"  |
> | 9     | -3.15614  | -1.39045    | 1.34395   | 0.343224   | "hello"  |
> | 10    | -1.67029  | 0.634846    | 2.08062   | -0.845479  | "hello"  |
> | 11    | 1.0       | 2.0         | 3.0       | 4.0        | "hello"  |
> 
> 
> But then this happens:
> 
> julia> push!(df, (1,2,3,4, "hi"))
> ERROR: no method convert(Type{Float64}, ASCIIString)
>  in setindex! at array.jl:305
>  in map_range_to! at range.jl:523
>  in map at range.jl:534
>  in push! at none:5
> 
> 
> It apparently tries to convert "hi" to Float64, even though the 5th type is 
> ASCIIString:
> 
> julia> col_types
> 1x5 DataFrame
> |-------|---------|---------|---------|---------|-------------|
> | Row # | Y       | X1      | X2      | X3      | label       |
> | 1     | Float64 | Float64 | Float64 | Float64 | ASCIIString |
> 
> 
> Gustavo
> 
> P.S.  Should the code go here?  
> https://github.com/JuliaStats/DataFrames.jl/blob/master/src/dataframe/dataframe.jl
> 
> 
> 
> On Friday, June 6, 2014 5:16:11 PM UTC-4, John Myles White wrote:
> You're right: any iterable could work.
> 
> Personally, I tend to minimize the use of functionality that depends upon the 
> columns of a DataFrame being in a specific order. It's certainly useful in 
> many cases, so we can't get rid of it. But I'm not excited about people 
> writing a lot more code that depends upon order than they do now.
> 
>  -- John
> 
> On Jun 6, 2014, at 1:07 PM, Ivar Nesje <[email protected]> wrote:
> 
>> Why can't any iterable (of the correct length) be accepted?
>> 
>> As long as the DataFrame have predefined types on the columns, it is just a 
>> matter of asserting or converting the type and copy it inn. Convert would 
>> probably be slower because the types would be unknown and it would have to 
>> dispatch dynamically to the right convert method.
>> 
>> kl. 18:58:51 UTC+2 fredag 6. juni 2014 skrev John Myles White følgende:
>> Yeah, I just dislike the gratuituous multiplicity of ways to do the same 
>> thing.
>> 
>>  -- John
>> 
>> On Jun 6, 2014, at 9:55 AM, Stefan Karpinski <[email protected]> wrote:
>> 
>>> Since all three can be indexed the same way, it seems like that should be a 
>>> minimal annoyance, no?
>>> 
>>> On Friday, June 6, 2014, John Myles White <[email protected]> wrote:
>>> The thing that annoys me about arrays is that we arguably need to accept 
>>> both vectors and 1-row matrices as inputs.
>>> 
>>>  -- John
>>> 
>>> On Jun 6, 2014, at 9:20 AM, Stefan Karpinski <[email protected]> wrote:
>>> 
>>>> See also https://github.com/JuliaStats/DataFrames.jl/issues/585. Using a 
>>>> tuple may make more sense, but it probably wouldn't hurt to allow an array 
>>>> as well.
>>>> 
>>>> On Friday, June 6, 2014, John Myles White <[email protected]> wrote:
>>>> If someone wants to submit a PR to allow adding a tuple as a row to a 
>>>> DataFrame, I’ll merge it.
>>>> 
>>>>  — John
>>>> 
>>>> On May 28, 2014, at 7:43 AM, John Myles White <[email protected]> 
>>>> wrote:
>>>> 
>>>>> I’m happy with using tuples since that will make it easier to construct 
>>>>> DataFrames from iterators.
>>>>> 
>>>>>  — John
>>>>> 
>>>>> On May 27, 2014, at 11:37 PM, Tomas Lycken <[email protected]> wrote:
>>>>> 
>>>>>> I like it - but maybe that wasn't so hard to guess I would ;)
>>>>>> 
>>>>>> // T
>>>>>> 
>>>>>> On Tuesday, May 27, 2014 10:11:15 PM UTC+2, Jacques Rioux wrote:
>>>>>> Let me add a thought here. I also think that adding a row to a dataframe 
>>>>>> should be easier. However, I do not think that an array would be the 
>>>>>> best container to represent a row because array members must all be of 
>>>>>> the same type which brings up Any as the only options in your example.
>>>>>> 
>>>>>> I think that appending or pushing a tuple with the right types could be 
>>>>>> made to work. 
>>>>>> 
>>>>>> So it would be 
>>>>>> 
>>>>>> julia> push!(psispread, (1.0,0.1,:Fake))
>>>>>> 
>>>>>> or
>>>>>> 
>>>>>> julia> append!(psispread, (1.0,0.1,:Fake))
>>>>>> 
>>>>>> since 
>>>>>> 
>>>>>> julia> typeof((1.0, 0.1, :fake))
>>>>>> (Float64,Float64,Symbol)
>>>>>> 
>>>>>> Note, I am not saying that this works now but that it could be made to 
>>>>>> work by adding the corresponding method to either function. It seems it 
>>>>>> is the right construct.
>>>>>> 
>>>>>> Any thoughts?
>>>>> 
>>>> 
>>> 
>> 
>

Re: [julia-users] Adding a row to a DataFrame

Reply via email to