Wes McKinney created ARROW-1047:
-----------------------------------

             Summary: [Java] Add generalized stream writer and reader 
interfaces that are decoupled from IO / message framing
                 Key: ARROW-1047
                 URL: https://issues.apache.org/jira/browse/ARROW-1047
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Java - Vectors
            Reporter: Wes McKinney


cc [~julienledem] [~elahrvivaz] [~nongli]

The ArrowWriter 
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
 accepts a WriteableByteChannel where the stream is written

It would be useful to be able to support other kinds of message framing and 
transport, like GRPC or HTTP. So rather than writing a complete Arrow stream as 
a single contiguous byte stream, the component messages (schema, dictionaries, 
and record batches) would be framed as separate messages in the underlying 
protocol. 

So if we were using ProtocolBuffers and gRPC as the underlying transport for 
the stream, we could encapsulate components of an Arrow stream in objects like:

{code:language=protobuf}
message ArrowMessagePB {
  required bytes serialized_data;
}
{code}

If the transport supports zero copy, that is obviously better than serializing 
then parsing a protocol buffer.

We should do this work in C++ as well to support more flexible stream 
transport. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to