Re: [R] how to rbind or concat two Arrow Tables?

Will Jones Mon, 21 Mar 2022 12:15:48 -0700

Hi Andrew,

I don't think we've implemented rbind yet, unfortunately. We've just
implemented concat_arrays (also bound to c()) [1], and that will be
available in the next release (or nightlies right now).


The one way you could "rbind" multiple feather files, if they have the same
schema, is by constructing a union dataset out of the two or more files.
This would look something like this:

> ds1 <- arrow::open_dataset("file1.feather", format="feather")
> ds2 <- arrow::open_dataset("file2.feather", format="feather")
> ds <- c(ds1, ds2)
> my_table <- collect(ds)

[1] https://github.com/apache/arrow/pull/12324

On Mon, Mar 21, 2022 at 11:49 AM Andrew Piskorski <a...@piskorski.com> wrote:

> Hi, I am using the latest R arrow package from CRAN, 7.0.0.
>
>   https://cran.r-project.org/web/packages/arrow/
>
> What is the right way to concatenate rows from two Arrow Tables
> together into one Table?  AKA, rbind() in base R.  Can I do this as a
> zero-copy view in memory, or will I need to write the new Table to
> disk first before I can use it?
>
> Right now I'm primarily concerned with doing this in R, but if it can
> be done in a better way using the Arrow C++ or other libraries, I'm
> definitely interested in understanding that too.
>
> It sounds like pyarrow.concat_tables() is the right tool for this, but
> I don't think there's currently any R equivalent.
>
>
> https://arrow.apache.org/docs/python/generated/pyarrow.concat_tables.html#pyarrow.concat_tables
>
> My motivation here, is that I will have have large amounts of older
> data on disk, which changes only rarely, plus smaller amounts of newer
> data changing more frequently.  I can store both the old and new data
> on disk in Arrow Feather files, so far so good.
>
> To analyze the data, I can simply mmap both one old and one newer file
> with arrow::read_feather(), find where their time-series of rows
> overlap, and use e.g. use myTable$Slice() to select just the rows I
> want from each of the two Tables.  So far so good.  But then how
> should I properly combine them into one Table, for further analysis
> downstream?
>
> Thanks for your help!
>
> --
> Andrew Piskorski <a...@piskorski.com>
>

Re: [R] how to rbind or concat two Arrow Tables?

Reply via email to