Hello Chris, at the moment, we have focused on sharing Arrow structures via inter process communication (IPC). In this case, the sharing is zero-serialization but not zero-copy. Given that we have good integration tests now for a good subset of all implementations, the sharing of memory between different implementation with no copy of the data is the next step.
As each Arrow implementation has its different user-facing data structures with the same backing memory layout, we will have to write some APIs that can convert one interface to another. A very simple example that takes the Java Arrow structures and makes it available to Python is included in this PR (comment): https://github.com/apache/arrow/pull/1693 Note that this is not needed for all languages. For example the Python, Ruby and GLib implementation is all backed on the C++ implementation. Here you can simply extract that backing C++ object and use in the other language. Thus a pyarrow.Array created in Python already contains a C++ arrow::Array object which then could be directly used as a backing object for Ruby. Uwe On Thu, Apr 12, 2018, at 9:22 AM, Chris Withers wrote: > Hi All, > > Apologies if I'm on the wrong list or struggle to get my question > across, I'm very new to Arrow, so please point me to the best place if > there's somewhere better to ask these kinds of questions... > > So, in my mind, Arrow provides a single in-memory model that supports > access from a bunch of different languages/environments (Pandas, Go, > C++, etc from looking at https://github.com/apache/arrow), which gives > me hope that, as someone just starting out on a project to go from a > proprietary C++ trading framework's market data archive to Pandas > dataframes would be a good way to look and, if things go through arrow > in the middle, potentially a way for other environments (Go, Julia?) to > make sure of the same thing. > > That left me wondering, however, that if I write a "to arrow" thing is > C++, how would a Go or Python user then wire things up to get access to > the Arrow data structures? > Somewhat important bonus point: how would that happen without memory > copies? (datasets here are many GB is most cases). > > cheers, > > Chris