Hi, I am using the latest R arrow package from CRAN, 7.0.0.

  https://cran.r-project.org/web/packages/arrow/

What is the right way to concatenate rows from two Arrow Tables
together into one Table?  AKA, rbind() in base R.  Can I do this as a
zero-copy view in memory, or will I need to write the new Table to
disk first before I can use it?

Right now I'm primarily concerned with doing this in R, but if it can
be done in a better way using the Arrow C++ or other libraries, I'm
definitely interested in understanding that too.

It sounds like pyarrow.concat_tables() is the right tool for this, but
I don't think there's currently any R equivalent.

  
https://arrow.apache.org/docs/python/generated/pyarrow.concat_tables.html#pyarrow.concat_tables

My motivation here, is that I will have have large amounts of older
data on disk, which changes only rarely, plus smaller amounts of newer
data changing more frequently.  I can store both the old and new data
on disk in Arrow Feather files, so far so good.

To analyze the data, I can simply mmap both one old and one newer file
with arrow::read_feather(), find where their time-series of rows
overlap, and use e.g. use myTable$Slice() to select just the rows I
want from each of the two Tables.  So far so good.  But then how
should I properly combine them into one Table, for further analysis
downstream?

Thanks for your help!

-- 
Andrew Piskorski <a...@piskorski.com>

Reply via email to