Here is the code Wes is referring to:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L73

This turns Spark rows into Arrow file format.

Here is the reading part in python:
https://github.com/apache/spark/blob/master/python/pyspark/serializers.py#L195


On Mon, Sep 18, 2017 at 9:34 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> I would suggest you take a look at the Arrow converter in Spark
> (search in the codebase), or one of the other developers or users may
> be able to respond on the list.
>
> On Fri, Sep 15, 2017 at 11:41 AM, Andrew Pham (BLOOMBERG/ 731 LEX)
> <apha...@bloomberg.net> wrote:
> > Thanks Wes.  I'm going over the Java source code and I have a few
> questions:
> >
> > Do we need to define a VectorSchemaRoot every time we define an
> ArrowStreamWriter?  In the interest of developing a simple proof of
> concept, for now I would just like to ferry a simple byte[] array onto the
> OutputStream.  What would be the easiest way to construct the corresponding
> VectorSchemaRoot?
> >
> > I'm also assuming that DictionaryProvider is optional
> >
> > Thanks again!
> >
> > From: dev@arrow.apache.org At: 09/14/17 18:45:15To:  Andrew Pham
> (BLOOMBERG/ 731 LEX ) ,  dev@arrow.apache.org
> > Subject: Re: Java Examples Writing Flatbuffer in IPC Message
> >
> > 1. Are you able to write to an shared memory IPC segment as an
> > OutputStream in Java? If so, do that, and use either the
> > ArrowFileWriter or ArrowStreamWriter in Java to write to it, like
> > https://github.com/apache/arrow/blob/master/java/tools/
> src/main/java/org/apache/arrow/tools/Integration.java#L160
> >
> > 2. Get a handle to that IPC segment in C++
> >
> > 3. Wrap the C++ IPC segment in an arrow::io::BufferReader
> >
> > 4. Use the functions in arrow/ipc/reader.h to reconstruct the data
> > structures from the BufferReader
> >
> > There are Arrow JIRAs to make #1, #2, and #3 easier out of the box,
> > but you must handle these details manually for now.
> >
> > On Thu, Sep 14, 2017 at 6:37 PM, Andrew Pham (BLOOMBERG/ 731 LEX)
> > <apha...@bloomberg.net> wrote:
> >> Thanks for the reply Wes.  I'm just looking for a way to ferry
> structured data (like a row in a SQL table) from one process (Java) to
> another (C++).  Any ideas on how to execute this?
> >>
> >> From: dev@arrow.apache.org At: 09/14/17 18:32:12To:  Andrew Pham
> (BLOOMBERG/ 731 LEX ) ,  dev@arrow.apache.org
> >> Subject: Re: Java Examples Writing Flatbuffer in IPC Message
> >>
> >> hi Andrew,
> >>
> >> Are you talking about writing a stream to shared memory? There are not
> >> functions in the Arrow Java library to do this out of the box. This
> >> would be a great contribution to make to the project:
> >> https://issues.apache.org/jira/browse/ARROW-721
> >>
> >> If you create an OutputStream that writes to shared memory, you can
> >> use the existing stream serialization tools. In C++ the functions to
> >> read a stream or read a single record batch are contained in
> >> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/reader.h.
> >>
> >> In C++ you can use the Plasma store for writing to POSIX shared
> >> memory. I would like to see some generic named POSIX shared memory
> >> tools get developed that are independent of Plasma, though:
> >> https://issues.apache.org/jira/browse/ARROW-1385
> >>
> >> Thanks
> >> Wes
> >>
> >>
> >> On Thu, Sep 14, 2017 at 6:24 PM, Andrew Pham (BLOOMBERG/ 731 LEX)
> >> <apha...@bloomberg.net> wrote:
> >>> The receiver/reader would be a C++ process.
> >>>
> >>> I'm looking for simple examples that illustrate this IPC.  Can anyone
> point me to any?  Thanks!
> >>
> >>
> >
> >
>

Reply via email to