Hello Chris,

at the moment, we have focused on sharing Arrow structures via inter process 
communication (IPC). In this case, the sharing is zero-serialization but not 
zero-copy. Given that we have good integration tests now for a good subset of 
all implementations, the sharing of memory between different implementation 
with no copy of the data is the next step.

As each Arrow implementation has its different user-facing data structures with 
the same backing memory layout, we will have to write some APIs that can 
convert one interface to another. A very simple example that takes the Java 
Arrow structures and makes it available to Python is included in this PR 
(comment): https://github.com/apache/arrow/pull/1693

Note that this is not needed for all languages. For example the Python, Ruby 
and GLib implementation is all backed on the C++ implementation. Here you can 
simply  extract that backing C++ object and use in the other language. Thus a 
pyarrow.Array created in Python already contains a C++ arrow::Array object 
which then could be directly used as a backing object for Ruby.

Uwe

On Thu, Apr 12, 2018, at 9:22 AM, Chris Withers wrote:
> Hi All,
> 
> Apologies if I'm on the wrong list or struggle to get my question 
> across, I'm very new to Arrow, so please point me to the best place if 
> there's somewhere better to ask these kinds of questions...
> 
> So, in my mind, Arrow provides a single in-memory model that supports 
> access from a bunch of different languages/environments (Pandas, Go, 
> C++, etc from looking at https://github.com/apache/arrow), which gives 
> me hope that, as someone just starting out on a project to go from a 
> proprietary C++ trading framework's market data archive to Pandas 
> dataframes would be a good way to look and, if things go through arrow 
> in the middle, potentially a way for other environments (Go, Julia?) to 
> make sure of the same thing.
> 
> That left me wondering, however, that if I write a "to arrow" thing is 
> C++, how would a Go or Python user then wire things up to get access to 
> the Arrow data structures?
> Somewhat important bonus point: how would that happen without memory 
> copies? (datasets here are many GB is most cases).
> 
> cheers,
> 
> Chris

Reply via email to