hi Clark,

Cool! Before you go too far down the rabbit hole, would you be open to
working within an R/ subdirectory in the Arrow codebase? It doesn't
have to be ready-to-ship software, and we are happy to set up a branch
in the repository for you to experiment so you don't have to worry
about bothering the master branch or breaking builds. Otherwise
importing your work into the project later will become more
complicated and require the Arrow PMC to do some paperwork:
http://incubator.apache.org/ip-clearance/ .

I am happy to be available to answer questions on the mailing list, or
offline, or discussions in JIRA or on GitHub pull requests. I am sure
that Uwe and the other C++ developers will be happy to be available.

To get some basics off the ground, the essentials are being able to
convert one or more record batches into an R data frame, and back.
This is what we did in

https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/arrow_to_pandas.h
https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pandas_to_arrow.h

We have thin bindings in Cython (which is similar to Rcpp) that make
this callable from Python.

What Hadley and I put together quickly for Feather last year was
effectively a single Arrow record batch converting to and from pandas
or R data frames. In Arrow, in practice you may be working with a
table in many smaller chunks.

Looking forward to getting this off the ground!

Thanks,
Wes

On Thu, Jul 27, 2017 at 7:40 PM, Clark Fitzgerald <clarkfi...@gmail.com> wrote:
> I've got at least a "hello world" for R / Arrow bindings in progress.
> https://github.com/clarkfitzg/Rarrow
>
> Over the next couple weeks I plan to spend some time looking at the Arrow
> C++ and Python sources and write a few bindings by hand, then think about
> how to automatically generate bindings from the C++. Several approaches are
> possible, Rffi / rdyncall, Rcpp modules, or RCodegen / RCIndex leveraging
> Clang. Not sure which, if any, will work.
>
> I'm a beginner in C++. It would be very helpful if someone was available to
> answer questions on the C++ Arrow codebase, since I'd rather not email the
> whole dev list for this.
>
> Thanks,
> Clark

Reply via email to