Here is the code Wes is referring to: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L73
This turns Spark rows into Arrow file format. Here is the reading part in python: https://github.com/apache/spark/blob/master/python/pyspark/serializers.py#L195 On Mon, Sep 18, 2017 at 9:34 PM, Wes McKinney <wesmck...@gmail.com> wrote: > I would suggest you take a look at the Arrow converter in Spark > (search in the codebase), or one of the other developers or users may > be able to respond on the list. > > On Fri, Sep 15, 2017 at 11:41 AM, Andrew Pham (BLOOMBERG/ 731 LEX) > <apha...@bloomberg.net> wrote: > > Thanks Wes. I'm going over the Java source code and I have a few > questions: > > > > Do we need to define a VectorSchemaRoot every time we define an > ArrowStreamWriter? In the interest of developing a simple proof of > concept, for now I would just like to ferry a simple byte[] array onto the > OutputStream. What would be the easiest way to construct the corresponding > VectorSchemaRoot? > > > > I'm also assuming that DictionaryProvider is optional > > > > Thanks again! > > > > From: dev@arrow.apache.org At: 09/14/17 18:45:15To: Andrew Pham > (BLOOMBERG/ 731 LEX ) , dev@arrow.apache.org > > Subject: Re: Java Examples Writing Flatbuffer in IPC Message > > > > 1. Are you able to write to an shared memory IPC segment as an > > OutputStream in Java? If so, do that, and use either the > > ArrowFileWriter or ArrowStreamWriter in Java to write to it, like > > https://github.com/apache/arrow/blob/master/java/tools/ > src/main/java/org/apache/arrow/tools/Integration.java#L160 > > > > 2. Get a handle to that IPC segment in C++ > > > > 3. Wrap the C++ IPC segment in an arrow::io::BufferReader > > > > 4. Use the functions in arrow/ipc/reader.h to reconstruct the data > > structures from the BufferReader > > > > There are Arrow JIRAs to make #1, #2, and #3 easier out of the box, > > but you must handle these details manually for now. > > > > On Thu, Sep 14, 2017 at 6:37 PM, Andrew Pham (BLOOMBERG/ 731 LEX) > > <apha...@bloomberg.net> wrote: > >> Thanks for the reply Wes. I'm just looking for a way to ferry > structured data (like a row in a SQL table) from one process (Java) to > another (C++). Any ideas on how to execute this? > >> > >> From: dev@arrow.apache.org At: 09/14/17 18:32:12To: Andrew Pham > (BLOOMBERG/ 731 LEX ) , dev@arrow.apache.org > >> Subject: Re: Java Examples Writing Flatbuffer in IPC Message > >> > >> hi Andrew, > >> > >> Are you talking about writing a stream to shared memory? There are not > >> functions in the Arrow Java library to do this out of the box. This > >> would be a great contribution to make to the project: > >> https://issues.apache.org/jira/browse/ARROW-721 > >> > >> If you create an OutputStream that writes to shared memory, you can > >> use the existing stream serialization tools. In C++ the functions to > >> read a stream or read a single record batch are contained in > >> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/reader.h. > >> > >> In C++ you can use the Plasma store for writing to POSIX shared > >> memory. I would like to see some generic named POSIX shared memory > >> tools get developed that are independent of Plasma, though: > >> https://issues.apache.org/jira/browse/ARROW-1385 > >> > >> Thanks > >> Wes > >> > >> > >> On Thu, Sep 14, 2017 at 6:24 PM, Andrew Pham (BLOOMBERG/ 731 LEX) > >> <apha...@bloomberg.net> wrote: > >>> The receiver/reader would be a C++ process. > >>> > >>> I'm looking for simple examples that illustrate this IPC. Can anyone > point me to any? Thanks! > >> > >> > > > > >