I think it'd make sense. Do you want to propose a section in the doc? I think it'd make the most sense to just define enums/flags in Flight SQL instead of arbitrary options. (A corresponding set of SqlInfo values could indicate support for each of them.)
On Thu, Feb 16, 2023, at 16:16, Matthew Topol wrote: > While implementing Transaction handling for ADBC via Flight SQL's > transaction primitives, another potential enhancement would be to expand > the BeginTransaction request to include a spot for "options" such as > IsolationLevel or marking a transaction as ReadOnly. > > Anyone have thoughts on this? > > On Wed, Feb 15, 2023 at 10:19 AM David Li <lidav...@apache.org> wrote: > >> The ADBC and Flight SQL proposals have been updated for >> Micah/Taeyun/Will's comments. >> >> On Wed, Feb 15, 2023, at 09:17, David Li wrote: >> > Hi Taeyun, >> > >> > Thanks for the detailed feedback! >> > >> > - I will clarify that PollFlightInfo should return as quickly as >> > possible on the first call, and that updates in progress value are also >> > OK (though the server shouldn't spam updates). (I wanted to avoid >> > streaming calls as it does not work as well with browser-based gRPC >> > clients.) >> > - I will clarify cancel_descriptor to note that it is optional. >> > - I wanted to avoid adding several new RPC methods, but if there is >> > rough agreement that these would be generally useful, I will add them >> > and deprecate the Flight SQL message [3]. (We could also possibly >> > define 'standard' DoAction Protobuf messages, but I worry about >> > implementation [1]. I may prototype this first, since then we could >> > avoid having redundant paths in Flight RPC/Flight SQL.) If we do this, >> > I think we do not need cancel_descriptor. (It can work like >> > CancelQuery.) >> > - I meant that CancelQuery should work with a partial FlightInfo from a >> > PollFlightInfo response. However this doesn't work if there's no >> > endpoints in the response! I will add app_metadata fields to >> > FlightInfo/FlightEndpoint. I think this can also be useful for >> > applications that need to add their own semantics to these messages >> > anyways, since Ticket is not meant to be parsed by the client. (You >> > could stuff the info into the schema, but that also doesn't work if the >> > schema is not yet known.) >> > >> > As for the partial DoGet: I think this is interesting and we can >> > discuss. Google BigQuery Storage supports this use case [2]. As you >> > note, if you are using this to request only a few rows, you may not >> > benefit much from Arrow. >> > >> > [1]: The C++ Protobuf library makes it difficult to define and share >> > messages across multiple shared libraries. On Windows, protoc does not >> > properly insert dllimport/dllexport macros (despite claiming to), and >> > on Unixes Protobuf interacts oddly with our linker script/symbol >> > hiding. This would be a lot of work, but I wonder if we could use an >> > implementation like upb/nanopb that does not rely on global state for >> > Arrow. This would also hopefully ease conflicts with projects that want >> > to use their own Protobuf definitions - as with Substrait. The main >> > challenge here is getting them to work with gRPC; I think we would have >> > to handroll the gRPC code that is normally generated. This may not be >> > too bad, just undocumented/may not be a stable API, and it would also >> > let us avoid the iffy casting we currently do to bypass gRPC's >> > serialization. >> > [2]: >> > >> https://github.com/googleapis/googleapis/blob/1870ba2163526fa9fba63bf899c92707476d4603/google/cloud/bigquery/storage/v1/storage.proto#L268-L282 >> > [3]: It may be time to consider explicit versioning of Flight >> > RPC/Flight SQL? >> > >> > On Tue, Feb 14, 2023, at 20:45, Taeyun Kim wrote: >> >> Hi David, >> >> >> >> Thank you very much for your proposal. >> >> My comments about it are as follows: >> >> >> >> About PollFlightInfo: >> >> >> >> Many SQL queries (in fact, almost all OLAP queries?) cannot produce any >> >> output records until it completes - because of GROUP BY or ORDER BY >> clause. >> >> In that case, PollFlightInfo can degenerate to GetFlightInfo since the >> >> server will not respond unless there are changes to the result. If the >> >> 'progress' field of RetryInfo is also regarded as the result, the >> server can >> >> respond with a different progress value. But the server that does not >> know >> >> the progress information cannot use that. >> >> The client can call the RPC with a timeout to avoid arbitrarily long >> >> polling, but in that case, the client would not be able to get a >> descriptor >> >> for cancellation of the query if the first PollFlightInfo does not >> return >> >> soon. Maybe it should be specified that the server processing >> PollFlightInfo >> >> must return immediately after it parses the query and starts executing >> it to >> >> provide the cancel_descriptor as soon as possible. >> >> Regarding cancel_descriptor, it would be nice for the server to unset it >> >> even if the query is still in progress, to notify the client that the >> query >> >> cancellation is not supported. >> >> BTW, I thought of something like StreamingGetFlightInfo, which is a >> >> bidirectional streaming version of PollFlightInfo. But maybe >> PollFlightInfo >> >> is better since the other client that does not own the GRPC call stream >> can >> >> cancel the query. (Or maybe StreamingGetFlightInfo can send >> >> cancel_descriptor for use outside the stream.) >> >> >> >> About CloseQuery: >> >> >> >> I think that it would be great if the RPC call is in Flight RPC rather >> than >> >> in FlightSQL RPC since the FlightInfo that it tries to close is got from >> >> GetFlightInfo/PollFlightInfo in Flight RPC. In that case, maybe it >> would be >> >> nice to name it 'CloseFlightInfo', to be matched with GetFlightInfo. >> >> >> >> About RefreshQuery: >> >> >> >> Same as CloseQuery. Maybe it can be named 'RetainFlightInfo'. >> >> >> >> About CancelQuery: >> >> >> >> I don't know how to use it. CancenQuery requires FlightInfo from the >> server. >> >> But by the time the client receives FlightInfo, the query has been >> already >> >> completed, doesn't it? >> >> >> >> Another (unrelated?) request (not in the proposal): >> >> >> >> In DoGet, the client must consume the whole endpoint. It can make it >> >> difficult for a client who only wants to or can retrieve only a small >> >> portion of it. (For example, there may be a web client that displays the >> >> result in tabular format page-by-page. A web server can cache the DoGet >> >> result, but by doing that the web server must manage a state. A >> stateful web >> >> server is harder to implement and manage.) Can we have a variant of >> DoGet >> >> that only retrieves a portion of an endpoint? That RPC method can have >> >> record_offset and record_count arguments. (Maybe it defeats the purpose >> of >> >> Flight RPC which prefers fast, bulk transfer.) >> >> >> >> Thank you. >> >> >> >> -----Original Message----- >> >> From: David Li <lidav...@apache.org> >> >> Sent: Wednesday, February 15, 2023 8:06 AM >> >> To: dev@arrow.apache.org >> >> Subject: Re: [DISCUSS] Flight RPC/Flight SQL/ADBC enhancements >> >> >> >> Ah, right. I haven't written up the last set of ADBC proposals yet. >> I'll do >> >> that in the next day or two. >> >> >> >> On Tue, Feb 14, 2023, at 17:38, Will Jones wrote: >> >>> Hi David, >> >>> >> >>> The proposals in the Flight/Flight SQL document look excellent. As >> >>> I've been looking at ADBC I've been wondering about polling / async >> >>> execution, cancellation, and progress indicators. Glad to see those in >> >>> the Flight document, but where are they in the ADBC issues? Do they >> >>> still need to be created? >> >>> >> >>> Best, >> >>> >> >>> Will Jones >> >>> >> >>> On Tue, Feb 14, 2023 at 12:58 PM David Li <lidav...@apache.org> wrote: >> >>> >> >>>> Hello, >> >>>> >> >>>> I would like to submit some Flight RPC and Flight SQL enhancements >> >>>> for discussion. They cover the following: >> >>>> >> >>>> - Executing 'queries' in a retryable, nonblocking way >> >>>> - Handling ordered result sets >> >>>> - Handling expiration of/re-reading result sets >> >>>> >> >>>> In addition, there are corresponding proposals for ADBC in >> >>>> anticipation of these features, James's catalogs proposal for Flight >> >>>> SQL, and other feedback. >> >>>> >> >>>> The Flight proposals are described in this document [1]. It should be >> >>>> open for comments. >> >>>> The ADBC proposals are filed as individual issues in this milestone >> [2]. >> >>>> >> >>>> Any feedback is much appreciated. There are not yet prototype >> >>>> implementations, but if there is a rough consensus then I can begin on >> >> that. >> >>>> >> >>>> [1]: >> >>>> https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXT >> >>>> ILDfkadx2g/edit?usp=sharing >> >>>> [2]: https://github.com/apache/arrow-adbc/milestone/3 >> >>>> >> >>>> Thanks, >> >>>> David >> >>>> >>