Correct, I'm maintaining standard protobuf encoding so a consumer that doesn't go byte by byte can still consumer/produce the messages.
More impls: for sure. On Wed, May 30, 2018 at 9:01 AM, Wes McKinney <wesmck...@gmail.com> wrote: > I see; looking more closely I see you've sidestepped the standard > Protobuf serialization to write the stream as tagged components: > > https://github.com/apache/arrow/compare/master...jacques-n:flight#diff- > 02cfc9235e22653fce8a7636c9f95507R241 > > and then reading the fields of the message tag by tag > > https://github.com/apache/arrow/compare/master...jacques-n:flight#diff- > 02cfc9235e22653fce8a7636c9f95507R159 > > Would it be correct that if a GRPC implementation doesn't provide > sufficient access to the byte stream (or if it doesn't care enough > about zero copy) that you could allow GRPC to return an instance of > the FlightData structure? > > I expect we'd want to see a few interoperable implementations (I > suggest Java, C++, Go) to harden the fine details. > > - Wes > > On Mon, May 28, 2018 at 3:32 PM, Jacques Nadeau <jacq...@apache.org> > wrote: > > Cutting through the layers of GRPC will be a per language approach thing. > > Assuming that each GRPC language implementation does a good job of > > separating message encapsulation from the base library, this should be > > straight-forward-ish. Hope improves around this as I see creation of > > non-protobuf protocols built on top of the base GRPC [1]. How to do this > in > > each language will probably take time looking at the GRPC internals for > > that language but can be a secondary step once you get the protocol > working > > (you can just pay for extra copies until then). > > > > In my Java approach I believe I do one read copy and zero write copies > > (needs more testing) which was my target. (Getting to zero-copy on read > > means a lot more complexity because your socket-reading has to be > protocol > > aware: even our bespoke layer in Dremio doesn't try to do that. I'd guess > > KRPC does the same but haven't reviewed the code to confirm.) > > > > Will try to get some more slides/readme and a proper proposed patch up > soon. > > > > [1] https://grpc.io/blog/flatbuffers > > > > > > > > On Mon, May 28, 2018 at 1:05 AM, Wes McKinney <wesmck...@gmail.com> > wrote: > > > >> hey Jacques, > >> > >> This is great news, I look forward to digging into this. My biggest > >> initial question is the Protobuf encapsulation, specifically: > >> > >> https://github.com/jacques-n/arrow/blob/flight/java/flight/ > >> src/main/protobuf/flight.proto#L99 > >> > >> My understanding of Protocol Buffers is that on read, the "data_body" > >> memory would be copied out of the serialized protobuf that came across > >> the wire. Your comment in the .proto says this "comes last in the > >> definition to help with sidecar patterns" -- my read is that it would > >> be up to us to do our own sidecar implementation, similar to how > >> Apache Kudu has zero-copy sidecars in their KRPC system [1] (the > >> comment there describes pretty much exactly the problem we have). I > >> saw that you also replied on a GRPC thread about this issue [2]. Could > >> you summarize what (if anything) stands in the way to get zero-copy on > >> write and read? > >> > >> - Wes > >> > >> [1]: https://github.com/apache/kudu/blob/master/src/kudu/rpc/ > >> rpc_sidecar.h#L34 > >> [2]: https://github.com/grpc/grpc-java/issues/1054#issuecomment- > 391692087 > >> > >> On Thu, May 24, 2018 at 6:57 AM, Jacques Nadeau <jacq...@apache.org> > >> wrote: > >> > FYI, if you want to see an example server you can run with a GRPC > >> generated > >> > client, you can run the ExampleFlightServer located at [1]. Very basic > >> > 'test' with that class and client is located at [2]. > >> > > >> > [1] > >> > https://github.com/jacques-n/arrow/tree/flight/java/flight/ > >> src/main/java/org/apache/arrow/flight/example > >> > [2] > >> > https://github.com/jacques-n/arrow/blob/flight/java/flight/ > >> src/test/java/org/apache/arrow/flight/example/TestExampleServer.java > >> > > >> > > >> > On Thu, May 24, 2018 at 11:51 AM, Jacques Nadeau <jacq...@apache.org> > >> wrote: > >> > > >> >> Hey All, > >> >> > >> >> I used my Strata talk today as a forcing function to make additional > >> >> progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it > “Apache > >> >> Arrow Flight”. You can take a look at the work here [2]. I’ll work to > >> clean > >> >> up my work and explain my thoughts about the protocol in the coming > >> days. > >> >> High-level: use protobuf as a encapsulation format so that any client > >> that > >> >> is supported in GRPC will work. However, we can optimize the > read/write > >> >> path for targeted languages and hand control the > >> >> serialization/deserialization and memory handling. (I did that in > this > >> Java > >> >> patch [3][4][5].) I also looked at starting to use GRPC generated > >> bindings > >> >> within Python but it looks like some glue code may be needed in the > C++ > >> >> layer since Python delegates down frequently. I also am still trying > to > >> >> understand GRPC back-pressure patterns and whether the protocol > >> >> realistically needs to change to cover real-world high performance > use > >> >> cases. > >> >> > >> >> I’ll send out some slides about the ideas and update README, etc. > soon. > >> >> > >> >> Thanks, > >> >> Jacques > >> >> > >> >> [1] https://github.com/jacques-n/arrow/blob/flight/java/flight/ > >> >> src/main/protobuf/flight.proto > >> >> [2] http://github.com/jacques-n/arrow/ > >> >> [3] https://github.com/jacques-n/arrow/tree/flight/ > >> >> java/flight/src/main/java/org/apache/arrow/flight/grpc > >> >> [4] https://github.com/jacques-n/arrow/blob/flight/ > >> >> java/flight/src/main/java/org/apache/arrow/flight/ > >> ArrowMessage.java#L253 > >> >> <https://github.com/jacques-n/arrow/blob/flight/java/flight/ > >> src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253> > >> >> [5] https://github.com/jacques-n/arrow/blob/flight/ > >> >> java/flight/src/main/java/org/apache/arrow/flight/ > >> ArrowMessage.java#L185 > >> >> > >> >> > >> >