I also sent a note about it to the dev list a month ago. Still have a huge internal need and interested in helping push this along where we can. Unfortunately, our team is more focused around Spark and doesn't have much experience working with the R community.
On Wed, Jul 19, 2017 at 1:44 PM Clark Fitzgerald <[email protected]> wrote: > Hello all, > > I saw the notes come through from today's call: > > > * R Arrow Bindings? > > - Find use cases within the R community, contributors needed > > - R Feather bindings a useful starting point > > This year I've been working on parallel R on datasets in the 100+ GB range, > and have found that loading and saving data from text files is a real > bottleneck. Another consideration is breaking the data up into chunks for > parallel processing while maintaining metadata and overall structure. So > I've been watching Parquet and Arrow. > > Specifically here are two use cases in R where Arrow / Parquet could be > helpful: > > - Splitting up a large data set into pieces which fit comfortably in memory > then applying normal R functions to each piece. Basically GROUP BY. > - Matloff's Software Alchemy, statistical averaging based on independent > chunks of data. This requires rows to be randomly assigned to chunks. > > Another option besides starting from the R Feather bindings is to start > with an automatically generated set of bindings: > https://github.com/duncantl/RCodeGen > > Best, > Clark Fitzgerald > -- VP of Engineering - dv01, Featured in Forbes Fintech 50 For 2016 <http://www.forbes.com/fintech/2016/#310668d56680> 915 Broadway | Suite 502 | New York, NY 10010 (646)-838-2310 [email protected] | www.dv01.co
