I think it'd make sense. Do you want to propose a section in the doc?

I think it'd make the most sense to just define enums/flags in Flight SQL 
instead of arbitrary options. (A corresponding set of SqlInfo values could 
indicate support for each of them.)

On Thu, Feb 16, 2023, at 16:16, Matthew Topol wrote:
> While implementing Transaction handling for ADBC via Flight SQL's
> transaction primitives, another potential enhancement would be to expand
> the BeginTransaction request to include a spot for "options" such as
> IsolationLevel or marking a transaction as ReadOnly.
>
> Anyone have thoughts on this?
>
> On Wed, Feb 15, 2023 at 10:19 AM David Li <lidav...@apache.org> wrote:
>
>> The ADBC and Flight SQL proposals have been updated for
>> Micah/Taeyun/Will's comments.
>>
>> On Wed, Feb 15, 2023, at 09:17, David Li wrote:
>> > Hi Taeyun,
>> >
>> > Thanks for the detailed feedback!
>> >
>> > - I will clarify that PollFlightInfo should return as quickly as
>> > possible on the first call, and that updates in progress value are also
>> > OK (though the server shouldn't spam updates). (I wanted to avoid
>> > streaming calls as it does not work as well with browser-based gRPC
>> > clients.)
>> > - I will clarify cancel_descriptor to note that it is optional.
>> > - I wanted to avoid adding several new RPC methods, but if there is
>> > rough agreement that these would be generally useful, I will add them
>> > and deprecate the Flight SQL message [3]. (We could also possibly
>> > define 'standard' DoAction Protobuf messages, but I worry about
>> > implementation [1]. I may prototype this first, since then we could
>> > avoid having redundant paths in Flight RPC/Flight SQL.) If we do this,
>> > I think we do not need cancel_descriptor. (It can work like
>> > CancelQuery.)
>> > - I meant that CancelQuery should work with a partial FlightInfo from a
>> > PollFlightInfo response. However this doesn't work if there's no
>> > endpoints in the response! I will add app_metadata fields to
>> > FlightInfo/FlightEndpoint. I think this can also be useful for
>> > applications that need to add their own semantics to these messages
>> > anyways, since Ticket is not meant to be parsed by the client. (You
>> > could stuff the info into the schema, but that also doesn't work if the
>> > schema is not yet known.)
>> >
>> > As for the partial DoGet: I think this is interesting and we can
>> > discuss. Google BigQuery Storage supports this use case [2]. As you
>> > note, if you are using this to request only a few rows, you may not
>> > benefit much from Arrow.
>> >
>> > [1]: The C++ Protobuf library makes it difficult to define and share
>> > messages across multiple shared libraries. On Windows, protoc does not
>> > properly insert dllimport/dllexport macros (despite claiming to), and
>> > on Unixes Protobuf interacts oddly with our linker script/symbol
>> > hiding. This would be a lot of work, but I wonder if we could use an
>> > implementation like upb/nanopb that does not rely on global state for
>> > Arrow. This would also hopefully ease conflicts with projects that want
>> > to use their own Protobuf definitions - as with Substrait. The main
>> > challenge here is getting them to work with gRPC; I think we would have
>> > to handroll the gRPC code that is normally generated. This may not be
>> > too bad, just undocumented/may not be a stable API, and it would also
>> > let us avoid the iffy casting we currently do to bypass gRPC's
>> > serialization.
>> > [2]:
>> >
>> https://github.com/googleapis/googleapis/blob/1870ba2163526fa9fba63bf899c92707476d4603/google/cloud/bigquery/storage/v1/storage.proto#L268-L282
>> > [3]: It may be time to consider explicit versioning of Flight
>> > RPC/Flight SQL?
>> >
>> > On Tue, Feb 14, 2023, at 20:45, Taeyun Kim wrote:
>> >> Hi David,
>> >>
>> >> Thank you very much for your proposal.
>> >> My comments about it are as follows:
>> >>
>> >> About PollFlightInfo:
>> >>
>> >> Many SQL queries (in fact, almost all OLAP queries?) cannot produce any
>> >> output records until it completes - because of GROUP BY or ORDER BY
>> clause.
>> >> In that case, PollFlightInfo can degenerate to GetFlightInfo since the
>> >> server will not respond unless there are changes to the result. If the
>> >> 'progress' field of RetryInfo is also regarded as the result, the
>> server can
>> >> respond with a different progress value. But the server that does not
>> know
>> >> the progress information cannot use that.
>> >> The client can call the RPC with a timeout to avoid arbitrarily long
>> >> polling, but in that case, the client would not be able to get a
>> descriptor
>> >> for cancellation of the query if the first PollFlightInfo does not
>> return
>> >> soon. Maybe it should be specified that the server processing
>> PollFlightInfo
>> >> must return immediately after it parses the query and starts executing
>> it to
>> >> provide the cancel_descriptor as soon as possible.
>> >> Regarding cancel_descriptor, it would be nice for the server to unset it
>> >> even if the query is still in progress, to notify the client that the
>> query
>> >> cancellation is not supported.
>> >> BTW, I thought of something like StreamingGetFlightInfo, which is a
>> >> bidirectional streaming version of PollFlightInfo. But maybe
>> PollFlightInfo
>> >> is better since the other client that does not own the GRPC call stream
>> can
>> >> cancel the query. (Or maybe StreamingGetFlightInfo can send
>> >> cancel_descriptor for use outside the stream.)
>> >>
>> >> About CloseQuery:
>> >>
>> >> I think that it would be great if the RPC call is in Flight RPC rather
>> than
>> >> in FlightSQL RPC since the FlightInfo that it tries to close is got from
>> >> GetFlightInfo/PollFlightInfo in Flight RPC. In that case, maybe it
>> would be
>> >> nice to name it 'CloseFlightInfo', to be matched with GetFlightInfo.
>> >>
>> >> About RefreshQuery:
>> >>
>> >> Same as CloseQuery. Maybe it can be named 'RetainFlightInfo'.
>> >>
>> >> About CancelQuery:
>> >>
>> >> I don't know how to use it. CancenQuery requires FlightInfo from the
>> server.
>> >> But by the time the client receives FlightInfo, the query has been
>> already
>> >> completed, doesn't it?
>> >>
>> >> Another (unrelated?) request (not in the proposal):
>> >>
>> >> In DoGet, the client must consume the whole endpoint. It can make it
>> >> difficult for a client who only wants to or can retrieve only a small
>> >> portion of it. (For example, there may be a web client that displays the
>> >> result in tabular format page-by-page. A web server can cache the DoGet
>> >> result, but by doing that the web server must manage a state. A
>> stateful web
>> >> server is harder to implement and manage.) Can we have a variant of
>> DoGet
>> >> that only retrieves a portion of an endpoint? That RPC method can have
>> >> record_offset and record_count arguments. (Maybe it defeats the purpose
>> of
>> >> Flight RPC which prefers fast, bulk transfer.)
>> >>
>> >> Thank you.
>> >>
>> >> -----Original Message-----
>> >> From: David Li <lidav...@apache.org>
>> >> Sent: Wednesday, February 15, 2023 8:06 AM
>> >> To: dev@arrow.apache.org
>> >> Subject: Re: [DISCUSS] Flight RPC/Flight SQL/ADBC enhancements
>> >>
>> >> Ah, right. I haven't written up the last set of ADBC proposals yet.
>> I'll do
>> >> that in the next day or two.
>> >>
>> >> On Tue, Feb 14, 2023, at 17:38, Will Jones wrote:
>> >>> Hi David,
>> >>>
>> >>> The proposals in the Flight/Flight SQL document look excellent. As
>> >>> I've been looking at ADBC I've been wondering about polling / async
>> >>> execution, cancellation, and progress indicators. Glad to see those in
>> >>> the Flight document, but where are they in the ADBC issues? Do they
>> >>> still need to be created?
>> >>>
>> >>> Best,
>> >>>
>> >>> Will Jones
>> >>>
>> >>> On Tue, Feb 14, 2023 at 12:58 PM David Li <lidav...@apache.org> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>> I would like to submit some Flight RPC and Flight SQL enhancements
>> >>>> for discussion. They cover the following:
>> >>>>
>> >>>> - Executing 'queries' in a retryable, nonblocking way
>> >>>> - Handling ordered result sets
>> >>>> - Handling expiration of/re-reading result sets
>> >>>>
>> >>>> In addition, there are corresponding proposals for ADBC in
>> >>>> anticipation of these features, James's catalogs proposal for Flight
>> >>>> SQL, and other feedback.
>> >>>>
>> >>>> The Flight proposals are described in this document [1]. It should be
>> >>>> open for comments.
>> >>>> The ADBC proposals are filed as individual issues in this milestone
>> [2].
>> >>>>
>> >>>> Any feedback is much appreciated. There are not yet prototype
>> >>>> implementations, but if there is a rough consensus then I can begin on
>> >> that.
>> >>>>
>> >>>> [1]:
>> >>>> https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXT
>> >>>> ILDfkadx2g/edit?usp=sharing
>> >>>> [2]: https://github.com/apache/arrow-adbc/milestone/3
>> >>>>
>> >>>> Thanks,
>> >>>> David
>> >>>>
>>

Reply via email to