I've created a Jira issue to track rbind implementation in R: https://issues.apache.org/jira/browse/ARROW-15989
On Mon, Mar 21, 2022 at 12:15 PM Will Jones <will.jones...@gmail.com> wrote: > Hi Andrew, > > I don't think we've implemented rbind yet, unfortunately. We've just > implemented concat_arrays (also bound to c()) [1], and that will be > available in the next release (or nightlies right now). > > The one way you could "rbind" multiple feather files, if they have the > same schema, is by constructing a union dataset out of the two or more > files. This would look something like this: > > > ds1 <- arrow::open_dataset("file1.feather", format="feather") > > ds2 <- arrow::open_dataset("file2.feather", format="feather") > > ds <- c(ds1, ds2) > > my_table <- collect(ds) > > [1] https://github.com/apache/arrow/pull/12324 > > On Mon, Mar 21, 2022 at 11:49 AM Andrew Piskorski <a...@piskorski.com> > wrote: > >> Hi, I am using the latest R arrow package from CRAN, 7.0.0. >> >> https://cran.r-project.org/web/packages/arrow/ >> >> What is the right way to concatenate rows from two Arrow Tables >> together into one Table? AKA, rbind() in base R. Can I do this as a >> zero-copy view in memory, or will I need to write the new Table to >> disk first before I can use it? >> >> Right now I'm primarily concerned with doing this in R, but if it can >> be done in a better way using the Arrow C++ or other libraries, I'm >> definitely interested in understanding that too. >> >> It sounds like pyarrow.concat_tables() is the right tool for this, but >> I don't think there's currently any R equivalent. >> >> >> https://arrow.apache.org/docs/python/generated/pyarrow.concat_tables.html#pyarrow.concat_tables >> >> My motivation here, is that I will have have large amounts of older >> data on disk, which changes only rarely, plus smaller amounts of newer >> data changing more frequently. I can store both the old and new data >> on disk in Arrow Feather files, so far so good. >> >> To analyze the data, I can simply mmap both one old and one newer file >> with arrow::read_feather(), find where their time-series of rows >> overlap, and use e.g. use myTable$Slice() to select just the rows I >> want from each of the two Tables. So far so good. But then how >> should I properly combine them into one Table, for further analysis >> downstream? >> >> Thanks for your help! >> >> -- >> Andrew Piskorski <a...@piskorski.com> >> >