I'm sorry I haven't had the time I wanted to spend on implementation, I'd
still like to get to it but cannot commit to a timeline for a little bit.
if anybody would like to take on the implementation work, I'm happy to
review.

On Wed, Apr 27, 2022 at 9:06 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Yes, next step is implementation which I've been delayed on.  I hope to
> have a little time this week to work on it and will post an update.
>
> On Wed, Apr 27, 2022 at 7:48 AM David Li <lidav...@apache.org> wrote:
>
>> Following up here - what are the next steps? The RFC PR looks fairly
>> complete, maybe we can help build out implementations in C++/Java/other
>> languages in preparation for a vote?
>>
>> On Wed, Mar 9, 2022, at 00:23, Micah Kornfield wrote:
>> >>
>> >> The operation flow would be like this, or what would it look like?
>> >> Client ---> GetFlightInfo (query/update operation in payload) --->
>> Server
>> >> ---> Results (non-streamed)
>> >
>> >
>> > This is roughly the flow I was imagining if the server chooses to send
>> back
>> > inlined data.
>> >
>> > -Micah
>> >
>> > On Tue, Mar 8, 2022 at 11:27 AM Gavin Ray <ray.gavi...@gmail.com>
>> wrote:
>> >
>> >> Thank you for doing this, left a few questions on the GH issue
>> >>
>> >> I would adopt this proposal as soon as it makes it into nightlies
>> >> (or possibly earlier if it's just a matter of regenerating the proto
>> >> definitions)
>> >>
>> >> The operation flow would be like this, or what would it look like?
>> >>
>> >> Client ---> GetFlightInfo (query/update operation in payload) --->
>> Server
>> >> ---> Results (non-streamed)
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Mar 8, 2022 at 2:04 PM Micah Kornfield <emkornfi...@gmail.com>
>> >> wrote:
>> >>
>> >>> Some people have already left comments on
>> >>> https://github.com/apache/arrow/pull/12571  More eyes on it would be
>> >>> appreciated.  If there aren't more comments, I'll try to start
>> >>> implementing
>> >>> this feature in Flight next week, and hopefully have a vote after it
>> is
>> >>> supported in Java and C++/Python.
>> >>>
>> >>>
>> >>> Thanks,
>> >>> Micah
>> >>>
>> >>> On Fri, Mar 4, 2022 at 10:54 PM Micah Kornfield <
>> emkornfi...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > I put together straw-man proposal in PR [1] for the Flight changes.
>> >>> > Ultimately, it seemed based on the use-cases discussed inlining the
>> >>> data on
>> >>> > the Ticket made the most sense.  This might be overly complex (I'm
>> not
>> >>> sure
>> >>> > how I feel about a enum indicating partial vs full results) but
>> welcome
>> >>> > feedback.  Once we get consensus on this proposal, I can add
>> changes to
>> >>> > Flight SQL and try to provide reference implementations.
>> >>> >
>> >>> > [1] https://github.com/apache/arrow/pull/12571
>> >>> >
>> >>> > On Tue, Mar 1, 2022 at 10:51 PM Micah Kornfield <
>> emkornfi...@gmail.com>
>> >>> > wrote:
>> >>> >
>> >>> >> Would it make sense to make this part of DoGet since it
>> >>> >>> still would be returning a record batch
>> >>> >>
>> >>> >> I would lean against this. I think in many cases the client doesn't
>> >>> know
>> >>> >> the size of the data that it expects.  Leaving the flexibility on
>> the
>> >>> >> server side to send back inlined data when it thinks it makes
>> sense,
>> >>> or a
>> >>> >> bunch of tickets when there is in fact a lot of data seems like the
>> >>> best
>> >>> >> option here.
>> >>> >>
>> >>> >> For cases like previewing data, you usually just want to get a
>> small
>> >>> >>> amount
>> >>> >>> of data quickly.
>> >>> >>
>> >>> >> This is interesting and might be an additional use case.  If we did
>> >>> >> decide to extend FlightInfo we might also want a way of annotating
>> >>> inlined
>> >>> >> data with its corresponding ticket.  That way even for large
>> results,
>> >>> you
>> >>> >> could still send back a small preview if desired.
>> >>> >>
>> >>> >> After considering it a little bit I think I'm sold that inlined
>> data
>> >>> >> should not replace a ticket.  So in my mind the open question is
>> >>> whether
>> >>> >> the client needs to actively opt-in to inlined data.  The
>> scenarios I
>> >>> could
>> >>> >> come with where inlined data isn't useful are:
>> >>> >> 1.  The client is an old client and isn't aware inline data might
>> be
>> >>> >> returned.  In this case the main cost is of extra data on the wire
>> and
>> >>> >> storing it as unknown fields [1].
>> >>> >> 2.  The client is a new client but still doesn't want to get inline
>> >>> data
>> >>> >> (it might want to distribute all consumption to other processes).
>> Same
>> >>> >> cost is paid as option 1.
>> >>> >>
>> >>> >> Are there other scenarios?  If servers choose reasonable limits on
>> what
>> >>> >> data to inline, the extra complexity of negotiating with the
>> client in
>> >>> this
>> >>> >> case might not be worth the benefits.
>> >>> >>
>> >>> >> Cheers,
>> >>> >> Micah
>> >>> >>
>> >>> >>
>> >>> >> [1]
>> >>> https://developers.google.com/protocol-buffers/docs/proto3#unknowns
>> >>> >>
>> >>> >> On Tue, Mar 1, 2022 at 10:01 PM Bryan Cutler <cutl...@gmail.com>
>> >>> wrote:
>> >>> >>
>> >>> >>> I think this would be a useful feature and be nice to have in
>> Flight
>> >>> >>> core.
>> >>> >>> For cases like previewing data, you usually just want to get a
>> small
>> >>> >>> amount
>> >>> >>> of data quickly. Would it make sense to make this part of DoGet
>> since
>> >>> it
>> >>> >>> still would be returning a record batch? Perhaps a Ticket could be
>> >>> made
>> >>> >>> to
>> >>> >>> have an optional FlightDescriptor that would serve as an
>> all-in-one
>> >>> shot?
>> >>> >>>
>> >>> >>> On Tue, Mar 1, 2022 at 8:44 AM David Li <lidav...@apache.org>
>> wrote:
>> >>> >>>
>> >>> >>> > I agree with something along Antoine's proposal, though: maybe
>> we
>> >>> >>> should
>> >>> >>> > be more structured with the flags (akin to what Micah mentioned
>> with
>> >>> >>> the
>> >>> >>> > Feature enum).
>> >>> >>> >
>> >>> >>> > Also, the flag could be embedded into the Flight SQL messages
>> >>> instead.
>> >>> >>> (So
>> >>> >>> > in effect, Flight would only add the capability to return data
>> with
>> >>> >>> > FlightInfo, and it's up to applications, like Flight SQL, to
>> decide
>> >>> how
>> >>> >>> > they want to take advantage of that.)
>> >>> >>> >
>> >>> >>> > I think having a completely separate method and return type and
>> >>> having
>> >>> >>> to
>> >>> >>> > poll for it beforehand somewhat defeats the purpose of having
>> >>> it/would
>> >>> >>> be
>> >>> >>> > much harder of a transition.
>> >>> >>> >
>> >>> >>> > Also: it should be `repeated FlightInfo inline_data` right? In
>> case
>> >>> we
>> >>> >>> > also need dictionary batches?
>> >>> >>> >
>> >>> >>> > On Tue, Mar 1, 2022, at 11:39, Antoine Pitrou wrote:
>> >>> >>> > > Can we just add the following field to the FlightDescriptor
>> >>> message:
>> >>> >>> > >
>> >>> >>> > >   bool accept_inline_data = 4;
>> >>> >>> > >
>> >>> >>> > > and this one to the FlightInfo message:
>> >>> >>> > >
>> >>> >>> > >   FlightData inline_data = 100;
>> >>> >>> > >
>> >>> >>> > > Then new clients can `accept_inline_data` to true (the default
>> >>> being
>> >>> >>> > > false if omitted) to signal servers that they can put the
>> data if
>> >>> >>> > > `inline_data` if deemed small enough.
>> >>> >>> > >
>> >>> >>> > > (the `accept_inline_data` field could also be used to the
>> Criteria
>> >>> >>> > > message)
>> >>> >>> > >
>> >>> >>> > >
>> >>> >>> > > Alternatively, if the FlightDescriptor expansion looks a bit
>> dirty
>> >>> >>> > > (FlightDescriptor being used in other contexts where
>> >>> >>> > > `accept_inline_data` makes no sense), we can instead define a
>> new
>> >>> >>> > > method:
>> >>> >>> > >
>> >>> >>> > >   rpc GetFlightInfoEx(GetFlightInfoRequest) returns
>> (FlightInfo)
>> >>> {}
>> >>> >>> > >
>> >>> >>> > > with:
>> >>> >>> > >
>> >>> >>> > > message GetFlightInfoRequest {
>> >>> >>> > >   FlightDescriptor flight_descriptor = 1;
>> >>> >>> > >   bool accept_inline_data = 2;
>> >>> >>> > > }
>> >>> >>> > >
>> >>> >>> > > Regards
>> >>> >>> > >
>> >>> >>> > > Antoine.
>> >>> >>> > >
>> >>> >>> > >
>> >>> >>> > > On Mon, 28 Feb 2022 11:29:12 -0800
>> >>> >>> > > James Duong <jam...@bitquilltech.com.INVALID> wrote:
>> >>> >>> > >> This seems reasonable, however we need to account for
>> existing
>> >>> >>> Flight
>> >>> >>> > >> clients that were written before this.
>> >>> >>> > >>
>> >>> >>> > >> It seems like the server will need to still handle the ticket
>> >>> >>> returned
>> >>> >>> > for
>> >>> >>> > >> getStream() for clients that are unaware of the small result
>> >>> >>> > optimization.
>> >>> >>> > >>
>> >>> >>> > >> On Mon, Feb 28, 2022 at 11:26 AM David Li <
>> lidav...@apache.org>
>> >>> >>> wrote:
>> >>> >>> > >>
>> >>> >>> > >> > Ah, that makes more sense, that would be a reasonable
>> >>> extension to
>> >>> >>> > Flight
>> >>> >>> > >> > overall. (While we're at it, I think it would help to have
>> an
>> >>> >>> > app_metadata
>> >>> >>> > >> > field in FlightInfo as well.)
>> >>> >>> > >> >
>> >>> >>> > >> > On Mon, Feb 28, 2022, at 14:24, Micah Kornfield wrote:
>> >>> >>> > >> > >>
>> >>> >>> > >> > >> But it seems reasonable to add a one-shot query path
>> using
>> >>> >>> DoGet.
>> >>> >>> > >> > >
>> >>> >>> > >> > >
>> >>> >>> > >> > > I was thinking more of adding a bytes field to FlightInfo
>> >>> that
>> >>> >>> > could
>> >>> >>> > >> > store
>> >>> >>> > >> > > arrow data.  That way GetFlightInfo would be the only RPC
>> >>> >>> necessary
>> >>> >>> > for
>> >>> >>> > >> > > small results when executing a CMD.  The client doesn't
>> >>> >>> necessarily
>> >>> >>> > know
>> >>> >>> > >> > > whether a query will return large or small results.
>> >>> >>> > >> > >
>> >>> >>> > >> > > On Mon, Feb 28, 2022 at 11:04 AM David Li <
>> >>> lidav...@apache.org>
>> >>> >>> > wrote:
>> >>> >>> > >> > >
>> >>> >>> > >> > >> I think the focus was on large result sets (though I
>> don't
>> >>> >>> recall
>> >>> >>> > this
>> >>> >>> > >> > >> being discussed before) and supporting multi-node setups
>> >>> (hence
>> >>> >>> > >> > >> GetFlightInfo/DoGet are separated). But it seems
>> reasonable
>> >>> to
>> >>> >>> add
>> >>> >>> > a
>> >>> >>> > >> > >> one-shot query path using DoGet.
>> >>> >>> > >> > >>
>> >>> >>> > >> > >> On Mon, Feb 28, 2022, at 13:32, Adam Lippai wrote:
>> >>> >>> > >> > >> > I saw the same. A small, stateless query ability
>> would be
>> >>> >>> nice
>> >>> >>> > >> > >> (connection
>> >>> >>> > >> > >> > open, initialization, query in one message, the
>> resultset
>> >>> in
>> >>> >>> > the
>> >>> >>> > >> > response
>> >>> >>> > >> > >> > in one message)
>> >>> >>> > >> > >> >
>> >>> >>> > >> > >> > On Mon, Feb 28, 2022, 13:12 Micah Kornfield <
>> >>> >>> > emkornfi...@gmail.com>
>> >>> >>> > >> > >> wrote:
>> >>> >>> > >> > >> >
>> >>> >>> > >> > >> >> I'm rereviewing the Flight SQL interfaces, and I'm
>> not
>> >>> sure
>> >>> >>> if
>> >>> >>> > I'm
>> >>> >>> > >> > >> missing
>> >>> >>> > >> > >> >> it but is there any optimization for small results?
>> My
>> >>> >>> concern
>> >>> >>> > is
>> >>> >>> > >> > that
>> >>> >>> > >> > >> the
>> >>> >>> > >> > >> >> overhead of the RPCs for the DoGet after executing
>> the
>> >>> query
>> >>> >>> > could
>> >>> >>> > >> > add
>> >>> >>> > >> > >> >> non-trivial latency for smaller results.
>> >>> >>> > >> > >> >>
>> >>> >>> > >> > >> >> Has anybody else thought about this/investigated
>> it?  Am
>> >>> I
>> >>> >>> > >> > understanding
>> >>> >>> > >> > >> >> this correctly?
>> >>> >>> > >> > >> >>
>> >>> >>> > >> > >> >> Thanks,
>> >>> >>> > >> > >> >> Micah
>> >>> >>> > >> > >> >>
>> >>> >>> > >> > >>
>> >>> >>> > >> >
>> >>> >>> > >>
>> >>> >>> > >>
>> >>> >>> >
>> >>> >>>
>> >>> >>
>> >>>
>> >>
>>
>

Reply via email to