For now, I’d recommend using this function:
function Base.append!(adf1::AbstractDataFrame,
adf2::AbstractDataFrame)
names(adf1) == names(adf2) || error("Column names do not match")
types(adf1) == types(adf2) || error("Column types do not match")
ncols = size(adf1, 2)
# TODO: This needs to be a sort of transaction to be 100% safe
for j in 1:ncols
append!(adf1[j], adf2[j])
end
return adf1
end
We’ll add something like this to DataFrames soon.
— John
On Jan 26, 2014, at 1:14 PM, Joosep Pata <[email protected]> wrote:
> Thanks! It seems to work quite well using append!(d1::DataArray,
> d2::DataArray) from DataArrays.jl trunk. If I had a time machine to extend
> the Sunday, I’d work on a proper version of vcat using that, but alas. I
> appreciate the amazing work that has gone into Julia, HDF5.jl and
> DataFrames.jl. :)
>
> On 26 Jan 2014, at 21:55, John Myles White <[email protected]> wrote:
>
>> This is quite close to being possible, but we’re missing a few things.
>>
>> Daniel Jones recently added an append! method to DataArrays, which would let
>> you do this column-by-column.
>>
>> To help you out, we need to add an append! method to DataFrames as well.
>> I’ve wanted that badly myself lately.
>>
>> I will try to get to this today, but am already pretty overwhelmed with work
>> for the day.
>>
>> — John
>>
>> On Jan 26, 2014, at 11:02 AM, Joosep Pata <[email protected]> wrote:
>>
>>> Is there a way to avoid copying when doing vcat(df1::DataFrame,
>>> df2::DataFrame, …)? I’m trying to open hundreds of files with DataFrames,
>>> merge all of them and save a single ~150M row x 100 col DataFrame using
>>> HDF5 and JLD (to be opened later using mmap), and it seems to work
>>> marvelously, apart from the vcat.
>>> Does a no-copy option exist? I’m aware of DataStreams as a concept, but as
>>> I understand, they’re not fully fleshed out yet.
>>
>