I've created a Jira issue to track rbind implementation in R:
https://issues.apache.org/jira/browse/ARROW-15989

On Mon, Mar 21, 2022 at 12:15 PM Will Jones <will.jones...@gmail.com> wrote:

> Hi Andrew,
>
> I don't think we've implemented rbind yet, unfortunately. We've just
> implemented concat_arrays (also bound to c()) [1], and that will be
> available in the next release (or nightlies right now).
>
> The one way you could "rbind" multiple feather files, if they have the
> same schema, is by constructing a union dataset out of the two or more
> files. This would look something like this:
>
> > ds1 <- arrow::open_dataset("file1.feather", format="feather")
> > ds2 <- arrow::open_dataset("file2.feather", format="feather")
> > ds <- c(ds1, ds2)
> > my_table <- collect(ds)
>
> [1] https://github.com/apache/arrow/pull/12324
>
> On Mon, Mar 21, 2022 at 11:49 AM Andrew Piskorski <a...@piskorski.com>
> wrote:
>
>> Hi, I am using the latest R arrow package from CRAN, 7.0.0.
>>
>>   https://cran.r-project.org/web/packages/arrow/
>>
>> What is the right way to concatenate rows from two Arrow Tables
>> together into one Table?  AKA, rbind() in base R.  Can I do this as a
>> zero-copy view in memory, or will I need to write the new Table to
>> disk first before I can use it?
>>
>> Right now I'm primarily concerned with doing this in R, but if it can
>> be done in a better way using the Arrow C++ or other libraries, I'm
>> definitely interested in understanding that too.
>>
>> It sounds like pyarrow.concat_tables() is the right tool for this, but
>> I don't think there's currently any R equivalent.
>>
>>
>> https://arrow.apache.org/docs/python/generated/pyarrow.concat_tables.html#pyarrow.concat_tables
>>
>> My motivation here, is that I will have have large amounts of older
>> data on disk, which changes only rarely, plus smaller amounts of newer
>> data changing more frequently.  I can store both the old and new data
>> on disk in Arrow Feather files, so far so good.
>>
>> To analyze the data, I can simply mmap both one old and one newer file
>> with arrow::read_feather(), find where their time-series of rows
>> overlap, and use e.g. use myTable$Slice() to select just the rows I
>> want from each of the two Tables.  So far so good.  But then how
>> should I properly combine them into one Table, for further analysis
>> downstream?
>>
>> Thanks for your help!
>>
>> --
>> Andrew Piskorski <a...@piskorski.com>
>>
>

Reply via email to