Sorry, I'll be limiting myself to very high-level remarks, but I have a
bit of trouble understanding the whole design.

- Why is it called "Arrow RPC" while it doesn't seem to be providing any
kind of RPC service to the user?

- Is the server supposed to store all "streams" (which mean data, AFAIU)
by itself?  So it's a centralized data store?  Some kind of very simple

- Does this mean we'll be having two independent, incompatible and
technologically heterogenous client/server systems, one for shared
memory data (Plasma), one for data with copies (Flight)?

- Why two "reference systems" (what is exactly meant by that)?

- Why the actions/results thing?  Since this all seems based on HTTP2, I
would expect any HTTP2 server to allow for multiple services, so the
user can (or should be able to) easily implement their own service along
Flight, on the same server.



Le 10/08/2018 à 03:44, Wes McKinney a écrit :
> hi folks,
> I left some feedback on this PR. If others could take a look
> (particularly at the .proto service definition) that would be useful.
> We should decide on an approach to getting multiple production-worthy
> Flight/RPC implementations ready to go. It would be a good goal to
> deliver (end-to-end send/receive data between Python and Java, or
> Python and other Python processes) in the next couple releases.
> - Wes
> On Wed, May 30, 2018 at 12:44 PM, Jacques Nadeau <jacq...@apache.org> wrote:
>> Correct, I'm maintaining standard protobuf encoding so a consumer that
>> doesn't go byte by byte can still consumer/produce the messages.
>> More impls: for sure.
>> On Wed, May 30, 2018 at 9:01 AM, Wes McKinney <wesmck...@gmail.com> wrote:
>>> I see; looking more closely I see you've sidestepped the standard
>>> Protobuf serialization to write the stream as tagged components:
>>> https://github.com/apache/arrow/compare/master...jacques-n:flight#diff-
>>> 02cfc9235e22653fce8a7636c9f95507R241
>>> and then reading the fields of the message tag by tag
>>> https://github.com/apache/arrow/compare/master...jacques-n:flight#diff-
>>> 02cfc9235e22653fce8a7636c9f95507R159
>>> Would it be correct that if a GRPC implementation doesn't provide
>>> sufficient access to the byte stream (or if it doesn't care enough
>>> about zero copy) that you could allow GRPC to return an instance of
>>> the FlightData structure?
>>> I expect we'd want to see a few interoperable implementations (I
>>> suggest Java, C++, Go) to harden the fine details.
>>> - Wes
>>> On Mon, May 28, 2018 at 3:32 PM, Jacques Nadeau <jacq...@apache.org>
>>> wrote:
>>>> Cutting through the layers of GRPC will be a per language approach thing.
>>>> Assuming that each GRPC language implementation does a good job of
>>>> separating message encapsulation from the base library, this should be
>>>> straight-forward-ish. Hope improves around this as I see creation of
>>>> non-protobuf protocols built on top of the base GRPC [1]. How to do this
>>> in
>>>> each language will probably take time looking at the GRPC internals for
>>>> that language but can be a secondary step once you get the protocol
>>> working
>>>> (you can just pay for extra copies until then).
>>>> In my Java approach I believe I do one read copy and zero write copies
>>>> (needs more testing) which was my target. (Getting to zero-copy on read
>>>> means a lot more complexity because your socket-reading has to be
>>> protocol
>>>> aware: even our bespoke layer in Dremio doesn't try to do that. I'd guess
>>>> KRPC does the same but haven't reviewed the code to confirm.)
>>>> Will try to get some more slides/readme and a proper proposed patch up
>>> soon.
>>>> [1] https://grpc.io/blog/flatbuffers
>>>> On Mon, May 28, 2018 at 1:05 AM, Wes McKinney <wesmck...@gmail.com>
>>> wrote:
>>>>> hey Jacques,
>>>>> This is great news, I look forward to digging into this. My biggest
>>>>> initial question is the Protobuf encapsulation, specifically:
>>>>> https://github.com/jacques-n/arrow/blob/flight/java/flight/
>>>>> src/main/protobuf/flight.proto#L99
>>>>> My understanding of Protocol Buffers is that on read, the "data_body"
>>>>> memory would be copied out of the serialized protobuf that came across
>>>>> the wire. Your comment in the .proto says this "comes last in the
>>>>> definition to help with sidecar patterns" -- my read is that it would
>>>>> be up to us to do our own sidecar implementation, similar to how
>>>>> Apache Kudu has zero-copy sidecars in their KRPC system [1] (the
>>>>> comment there describes pretty much exactly the problem we have). I
>>>>> saw that you also replied on a GRPC thread about this issue [2]. Could
>>>>> you summarize what (if anything) stands in the way to get zero-copy on
>>>>> write and read?
>>>>> - Wes
>>>>> [1]: https://github.com/apache/kudu/blob/master/src/kudu/rpc/
>>>>> rpc_sidecar.h#L34
>>>>> [2]: https://github.com/grpc/grpc-java/issues/1054#issuecomment-
>>> 391692087
>>>>> On Thu, May 24, 2018 at 6:57 AM, Jacques Nadeau <jacq...@apache.org>
>>>>> wrote:
>>>>>> FYI, if you want to see an example server you can run with a GRPC
>>>>> generated
>>>>>> client, you can run the ExampleFlightServer located at [1]. Very basic
>>>>>> 'test' with that class and client is located at [2].
>>>>>> [1]
>>>>>> https://github.com/jacques-n/arrow/tree/flight/java/flight/
>>>>> src/main/java/org/apache/arrow/flight/example
>>>>>> [2]
>>>>>> https://github.com/jacques-n/arrow/blob/flight/java/flight/
>>>>> src/test/java/org/apache/arrow/flight/example/TestExampleServer.java
>>>>>> On Thu, May 24, 2018 at 11:51 AM, Jacques Nadeau <jacq...@apache.org>
>>>>> wrote:
>>>>>>> Hey All,
>>>>>>> I used my Strata talk today as a forcing function to make additional
>>>>>>> progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it
>>> “Apache
>>>>>>> Arrow Flight”. You can take a look at the work here [2]. I’ll work to
>>>>> clean
>>>>>>> up my work and explain my thoughts about the protocol in the coming
>>>>> days.
>>>>>>> High-level: use protobuf as a encapsulation format so that any client
>>>>> that
>>>>>>> is supported in GRPC will work. However, we can optimize the
>>> read/write
>>>>>>> path for targeted languages and hand control the
>>>>>>> serialization/deserialization and memory handling. (I did that in
>>> this
>>>>> Java
>>>>>>> patch [3][4][5].) I also looked at starting to use GRPC generated
>>>>> bindings
>>>>>>> within Python but it looks like some glue code may be needed in the
>>> C++
>>>>>>> layer since Python delegates down frequently. I also am still trying
>>> to
>>>>>>> understand GRPC back-pressure patterns and whether the protocol
>>>>>>> realistically needs to change to cover real-world high performance
>>> use
>>>>>>> cases.
>>>>>>> I’ll send out some slides about the ideas and update README, etc.
>>> soon.
>>>>>>> Thanks,
>>>>>>> Jacques
>>>>>>> [1] https://github.com/jacques-n/arrow/blob/flight/java/flight/
>>>>>>> src/main/protobuf/flight.proto
>>>>>>> [2] http://github.com/jacques-n/arrow/
>>>>>>> [3] https://github.com/jacques-n/arrow/tree/flight/
>>>>>>> java/flight/src/main/java/org/apache/arrow/flight/grpc
>>>>>>> [4] https://github.com/jacques-n/arrow/blob/flight/
>>>>>>> java/flight/src/main/java/org/apache/arrow/flight/
>>>>> ArrowMessage.java#L253
>>>>>>> <https://github.com/jacques-n/arrow/blob/flight/java/flight/
>>>>> src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253>
>>>>>>> [5] https://github.com/jacques-n/arrow/blob/flight/
>>>>>>> java/flight/src/main/java/org/apache/arrow/flight/
>>>>> ArrowMessage.java#L185

Reply via email to