at the moment, we have focused on sharing Arrow structures via inter process
communication (IPC). In this case, the sharing is zero-serialization but not
zero-copy. Given that we have good integration tests now for a good subset of
all implementations, the sharing of memory between different implementation
with no copy of the data is the next step.
As each Arrow implementation has its different user-facing data structures with
the same backing memory layout, we will have to write some APIs that can
convert one interface to another. A very simple example that takes the Java
Arrow structures and makes it available to Python is included in this PR
Note that this is not needed for all languages. For example the Python, Ruby
and GLib implementation is all backed on the C++ implementation. Here you can
simply extract that backing C++ object and use in the other language. Thus a
pyarrow.Array created in Python already contains a C++ arrow::Array object
which then could be directly used as a backing object for Ruby.
On Thu, Apr 12, 2018, at 9:22 AM, Chris Withers wrote:
> Hi All,
> Apologies if I'm on the wrong list or struggle to get my question
> across, I'm very new to Arrow, so please point me to the best place if
> there's somewhere better to ask these kinds of questions...
> So, in my mind, Arrow provides a single in-memory model that supports
> access from a bunch of different languages/environments (Pandas, Go,
> C++, etc from looking at https://github.com/apache/arrow), which gives
> me hope that, as someone just starting out on a project to go from a
> proprietary C++ trading framework's market data archive to Pandas
> dataframes would be a good way to look and, if things go through arrow
> in the middle, potentially a way for other environments (Go, Julia?) to
> make sure of the same thing.
> That left me wondering, however, that if I write a "to arrow" thing is
> C++, how would a Go or Python user then wire things up to get access to
> the Arrow data structures?
> Somewhat important bonus point: how would that happen without memory
> copies? (datasets here are many GB is most cases).