The question is whether to repurpose the existing FlightData
structure, and allow for the metadata field to be filled in and data
fields to be blank (as a control message), or to wrap the FlightData
structure in another structure that explicitly distinguishes between
control and data messages.

The other question is how to handle the metadata fields. So far, we've
used bytestring fields for application-defined data. This is workable
if you want to use Protobuf to define the contents of those fields,
but requires you to pack/unpack your Protobuf into/from the bytestring
field. If we instead used the Protobuf Any field, a dynamically typed
field, this would be more convenient, but then we'd be exposing
Protobuf types. We could alternatively use a combination of a type
field and a bytestring field, mimicking what the Protobuf Any type
looks like on the wire. I'm not sure this is actually cleaner in any
of the language APIs, though.

David

On 10/21/19, Antoine Pitrou <anto...@python.org> wrote:
>
> Can one of you explain what is being proposed in non-protobuf terms?
> Knowledge of protobuf shouldn't be required to use Flight.
>
> Regards
>
> Antoine.
>
>
> Le 21/10/2019 à 15:46, David Li a écrit :
>> Oneof doesn't actually change the wire encoding; it would just be
>> application-level logic. (The official guide doesn't even mention it
>> in the encoding docs; I found
>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> as well.)
>>
>> If I follow you, Jacques, then you are proposing essentially inlining
>> the definition of Any, e.g.
>>
>> message FlightMessage {
>>   oneof message {
>>     FlightData data = 1;
>>     FlightAny metadata = 2;
>>   }
>> }
>>
>> message FlightAny {
>>   string type = 1;
>>   bytes data = 2;
>> }
>>
>> Is this correct?
>>
>> It might be nice to consider the wrapper message for DoGet/DoPut as
>> well, but at that point, I'd rather we be consistent with all of them,
>> rather than have one of the three methods do its own thing.
>>
>> Thanks,
>> David
>>
>> On 10/20/19, Jacques Nadeau <jacq...@apache.org> wrote:
>>> I think we could probably expose the oneof behavior without exposing the
>>> protobuf functions. On the any... hmm. I guess we could expose as two
>>> fields: type and data. Then users could use it for whatever but if
>>> people
>>> wanted to treat it as any, it would work. (Basically a user could use
>>> any
>>> with it easily but they could also use any other mechanism). At least in
>>> java, the any concepts are pretty simple/diy. Are other language
>>> bindings
>>> less diy?
>>>
>>> I'm *not* hardcore against the empty FlightData + metadata but it just
>>> seemed a bit janky.
>>>
>>> Thinking about the control message/wrapper object thing, I wonder if we
>>> should redefine DoPut and DoGet to have the same property if we think it
>>> is
>>> a good idea...
>>>
>>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li.david...@gmail.com> wrote:
>>>
>>>> I was definitely considering having control messages without data, and
>>>> I thought that could be encoded by a FlightData with only app_metadata
>>>> set. I think I understand your position now: FlightData should always
>>>> carry (some) data (with optional metadata)?
>>>>
>>>> That makes sense to me, and is consistent with the documentation on
>>>> FlightData in the Protobuf file. I was worried about having a
>>>> redundant metadata field, but oneof prevents that from happening, and
>>>> overall having a clear separation between data and control messages is
>>>> cleaner.
>>>>
>>>> As for using Protobuf's Any: so far, we've refrained from exposing
>>>> Protobuf by using bytes, would we want to change that now?
>>>>
>>>> Best,
>>>> David
>>>>
>>>> On 10/16/19, Jacques Nadeau <jacq...@apache.org> wrote:
>>>>> Hey David,
>>>>>
>>>>> RE: Async: I was trying to match the pattern we use for doget/doput
>>>>> for
>>>>> async. Yes, more thinking java given java grpc's async always pattern.
>>>>>
>>>>> On the comment around the FlightData, I think it is overloading the
>>>> message
>>>>> to use metadata for this. If I want to send a control message
>>>> independently
>>>>> of the data message, I would have to define something like an empty
>>>> flight
>>>>> data message that has custom metadata. Why not support a container
>>>>> object
>>>>> with a oneof{FlightData, Any} in it instead so users can add more data
>>>>> as
>>>>> desired. The default impl could be a noop for the Any messages.
>>>>>
>>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li <li.david...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Jacques,
>>>>>>
>>>>>> Thanks for the comments.
>>>>>>
>>>>>> - I do agree DoExchange is a better name!
>>>>>> - FlightData already has metadata fields as a result of prior
>>>>>> proposals, so I don't think we need a new message to carry that kind
>>>>>> of information.
>>>>>> - I like the suggestion of an async handler to handle incoming
>>>>>> messages as the fundamental API; it would actually be quite natural
>>>>>> to
>>>>>> implement in Flight/Java. I will note that it's not possible in
>>>>>> C++/Python without spawning a thread, though. (In essence, gRPC-Java
>>>>>> is async-always and gRPC-C++ is sync-always.) There are experimental
>>>>>> C++ APIs that would let us do something similar to Java, but those
>>>>>> are
>>>>>> only in relatively recent gRPC versions and are still under
>>>>>> development (contrary to the interceptor APIs which have been around
>>>>>> for quite a while).
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 10/15/19, Jacques Nadeau <jacq...@apache.org> wrote:
>>>>>>> I like it. Added some comments to the doc. Might worth discussion
>>>>>>> here
>>>>>>> depending on your thoughts.
>>>>>>>
>>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li <li.david...@gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>>> Hey Ryan,
>>>>>>>>
>>>>>>>> Thanks for the comments.
>>>>>>>>
>>>>>>>> Concrete example: I've edited the doc to provide a Python strawman.
>>>>>>>>
>>>>>>>> Sync vs async: while I don't touch on it, you could interleave
>>>> uploads
>>>>>>>> and downloads if you were so inclined. Right now, synchronous APIs
>>>>>>>> make this error-prone, e.g. if both client and server wait for each
>>>>>>>> other due to an application logic bug. (gRPC doesn't give us the
>>>>>>>> ability to have per-read timeouts, only an overall timeout.) As an
>>>>>>>> example of this happening with DoPut, see ARROW-6063:
>>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>>>>>>>>
>>>>>>>> This is mostly tangential though, eventually we will want to design
>>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional stream
>>>>>>>> like
>>>>>>>> this (and like DoPut) just makes these pitfalls easier to run into.
>>>>>>>>
>>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the main
>>>>>>>> concern is that depending on how you deploy, two separate calls
>>>>>>>> could
>>>>>>>> get routed to different instances. Additionally, gRPC has some
>>>>>>>> reconnection behaviors; if the server goes away in between the two
>>>>>>>> calls, but it then restarts or there is another instance available,
>>>>>>>> the client will happily reconnect to the new server without
>>>>>>>> warning.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 10/15/19, Ryan Murray <rym...@dremio.com> wrote:
>>>>>>>>> Hey David,
>>>>>>>>>
>>>>>>>>> I think this proposal makes a lot of sense. I like it and the
>>>>>>>>> possibility
>>>>>>>>> of remote compute via arrow buffers. One thing that would help me
>>>>>> would
>>>>>>>> be
>>>>>>>>> a concrete example of the API in a real life use case. Also, what
>>>>>> would
>>>>>>>> the
>>>>>>>>> client experience be in terms of sync vs asyc? Would the client
>>>>>>>>> block
>>>>>>>> till
>>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a, b) or
>>>>>>>>> would
>>>>>>>> the
>>>>>>>>> client wait to be signaled that computation was done. If the
>>>>>>>>> later
>>>>>>>>> how
>>>>>>>>> is
>>>>>>>>> that different from a DoPut then DoGet? I suppose that this could
>>>> be
>>>>>>>>> implemented without extending the RPC interface but rather by a
>>>>>>>>> function/util?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Ryan
>>>>>>>>>
>>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <li.david...@gmail.com>
>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> We've been using Flight quite successfully so far, but we have
>>>>>>>>>> identified a new use case on the horizon: being able to both
>>>>>>>>>> send
>>>>>>>>>> and
>>>>>>>>>> retrieve Arrow data within a single RPC call. To that end, I've
>>>>>>>>>> written up a proposal for a new RPC method:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>>>>>>>>>>
>>>>>>>>>> Please let me know if you can't view or comment on the document.
>>>>>>>>>> I'd
>>>>>>>>>> appreciate any feedback; I think this is a relatively
>>>>>>>>>> straightforward
>>>>>>>>>> addition - it is essentially "DoPutThenGet".
>>>>>>>>>>
>>>>>>>>>> This is a format change and would require a vote. I've decided
>>>>>>>>>> to
>>>>>>>>>> table the other format change I had proposed (on DoPut), as it
>>>>>> doesn't
>>>>>>>>>> functionally change Flight, just the interpretation of the
>>>>>>>>>> semantics.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>>>>>>>>>
>>>>>>>>> +447540852009 | rym...@dremio.com
>>>>>>>>>
>>>>>>>>> <https://www.dremio.com/>
>>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join our
>>>>>>>>> community
>>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>>>>>>>>> <https://www.dremio.com/download>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Reply via email to