Thanks, this is actually very instructive.  I'm curious to know, is JSON the 
normal standardized representation for our Arrow Schema objects in the 
different programming languages?  If so, I'm wondering if the normal workflow 
is to compile a JSON into say, a Java class, and utilize that to produce Arrow 
Schema objects?

I'm wondering what's the canonical way to define a language-agnostic schema and 
then generate Java/C++ from that (perhaps a POJO) so that we can further 
transform that into an Arrow Schema using an API like so:

Schema toArrowSchema(Class<?> schema);

Or if there's an even better way to do things, I'd be interested to hear too.

Thanks!

From: dev@arrow.apache.org At: 09/18/17 22:59:28To:  dev@arrow.apache.org
Cc:  Andrew Pham (BLOOMBERG/ 731 LEX ) 
Subject: Re: Java Examples Writing Flatbuffer in IPC Message

Here is the code Wes is referring to:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L73

This turns Spark rows into Arrow file format.

Here is the reading part in python:
https://github.com/apache/spark/blob/master/python/pyspark/serializers.py#L195


On Mon, Sep 18, 2017 at 9:34 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> I would suggest you take a look at the Arrow converter in Spark
> (search in the codebase), or one of the other developers or users may
> be able to respond on the list.
>
> On Fri, Sep 15, 2017 at 11:41 AM, Andrew Pham (BLOOMBERG/ 731 LEX)
> <apha...@bloomberg.net> wrote:
> > Thanks Wes.  I'm going over the Java source code and I have a few
> questions:
> >
> > Do we need to define a VectorSchemaRoot every time we define an
> ArrowStreamWriter?  In the interest of developing a simple proof of
> concept, for now I would just like to ferry a simple byte[] array onto the
> OutputStream.  What would be the easiest way to construct the corresponding
> VectorSchemaRoot?
> >
> > I'm also assuming that DictionaryProvider is optional
> >
> > Thanks again!
> >
> > From: dev@arrow.apache.org At: 09/14/17 18:45:15To:  Andrew Pham
> (BLOOMBERG/ 731 LEX ) ,  dev@arrow.apache.org
> > Subject: Re: Java Examples Writing Flatbuffer in IPC Message
> >
> > 1. Are you able to write to an shared memory IPC segment as an
> > OutputStream in Java? If so, do that, and use either the
> > ArrowFileWriter or ArrowStreamWriter in Java to write to it, like
> > https://github.com/apache/arrow/blob/master/java/tools/
> src/main/java/org/apache/arrow/tools/Integration.java#L160
> >
> > 2. Get a handle to that IPC segment in C++
> >
> > 3. Wrap the C++ IPC segment in an arrow::io::BufferReader
> >
> > 4. Use the functions in arrow/ipc/reader.h to reconstruct the data
> > structures from the BufferReader
> >
> > There are Arrow JIRAs to make #1, #2, and #3 easier out of the box,
> > but you must handle these details manually for now.
> >
> > On Thu, Sep 14, 2017 at 6:37 PM, Andrew Pham (BLOOMBERG/ 731 LEX)
> > <apha...@bloomberg.net> wrote:
> >> Thanks for the reply Wes.  I'm just looking for a way to ferry
> structured data (like a row in a SQL table) from one process (Java) to
> another (C++).  Any ideas on how to execute this?
> >>
> >> From: dev@arrow.apache.org At: 09/14/17 18:32:12To:  Andrew Pham
> (BLOOMBERG/ 731 LEX ) ,  dev@arrow.apache.org
> >> Subject: Re: Java Examples Writing Flatbuffer in IPC Message
> >>
> >> hi Andrew,
> >>
> >> Are you talking about writing a stream to shared memory? There are not
> >> functions in the Arrow Java library to do this out of the box. This
> >> would be a great contribution to make to the project:
> >> https://issues.apache.org/jira/browse/ARROW-721
> >>
> >> If you create an OutputStream that writes to shared memory, you can
> >> use the existing stream serialization tools. In C++ the functions to
> >> read a stream or read a single record batch are contained in
> >> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/reader.h.
> >>
> >> In C++ you can use the Plasma store for writing to POSIX shared
> >> memory. I would like to see some generic named POSIX shared memory
> >> tools get developed that are independent of Plasma, though:
> >> https://issues.apache.org/jira/browse/ARROW-1385
> >>
> >> Thanks
> >> Wes
> >>
> >>
> >> On Thu, Sep 14, 2017 at 6:24 PM, Andrew Pham (BLOOMBERG/ 731 LEX)
> >> <apha...@bloomberg.net> wrote:
> >>> The receiver/reader would be a C++ process.
> >>>
> >>> I'm looking for simple examples that illustrate this IPC.  Can anyone
> point me to any?  Thanks!
> >>
> >>
> >
> >
>


Reply via email to