Re: [julia-users] Merging dataframes

John Myles White Sun, 26 Jan 2014 13:28:55 -0800

For now, I’d recommend using this function:

function Base.append!(adf1::AbstractDataFrame,
                      adf2::AbstractDataFrame)
    names(adf1) == names(adf2) || error("Column names do not match")
    types(adf1) == types(adf2) || error("Column types do not match")
    ncols = size(adf1, 2)
    # TODO: This needs to be a sort of transaction to be 100% safe
    for j in 1:ncols
        append!(adf1[j], adf2[j])
    end
    return adf1
end


We’ll add something like this to DataFrames soon.

 — John

On Jan 26, 2014, at 1:14 PM, Joosep Pata <[email protected]> wrote:

> Thanks! It seems to work quite well using append!(d1::DataArray, 
> d2::DataArray) from DataArrays.jl trunk. If I had a time machine to extend 
> the Sunday, I’d work on a proper version of vcat using that, but alas. I 
> appreciate the amazing work that has gone into Julia, HDF5.jl and 
> DataFrames.jl. :)
> 
> On 26 Jan 2014, at 21:55, John Myles White <[email protected]> wrote:
> 
>> This is quite close to being possible, but we’re missing a few things.
>> 
>> Daniel Jones recently added an append! method to DataArrays, which would let 
>> you do this column-by-column.
>> 
>> To help you out, we need to add an append! method to DataFrames as well. 
>> I’ve wanted that badly myself lately.
>> 
>> I will try to get to this today, but am already pretty overwhelmed with work 
>> for the day.
>> 
>> — John
>> 
>> On Jan 26, 2014, at 11:02 AM, Joosep Pata <[email protected]> wrote:
>> 
>>> Is there a way to avoid copying when doing vcat(df1::DataFrame, 
>>> df2::DataFrame, …)? I’m trying to open hundreds of files with DataFrames, 
>>> merge all of them and save a single ~150M row x 100 col DataFrame using 
>>> HDF5 and JLD (to be opened later using mmap), and it seems to work 
>>> marvelously, apart from the vcat.
>>> Does a no-copy option exist? I’m aware of DataStreams as a concept, but as 
>>> I understand, they’re not fully fleshed out yet.
>> 
>

Re: [julia-users] Merging dataframes

Reply via email to