Hi,

I would like to propose adding support for result set
expiration to Apache Arrow Flight. If anyone has comments
for this proposal, please share them at here or the issue
for this proposal:
https://github.com/apache/arrow/issues/35500

This is one of proposals in "[DISCUSS] Flight RPC/Flight
SQL/ADBC enhancements":

  https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp

See also the "Flight RPC: Result Set Expiration" section in
the design document for the proposals:

  
https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit#

Changes since the original proposal:

* Pre-defined action names:
  * CancelQuery -> CancelFlightInfo
  * RefreshQuery -> RefreshFlightEndpoint
  * CloseQuery -> CloseFlightInfo
  See also the following discussions:
  * Query -> FlightInfo:
    https://lists.apache.org/thread/71pp95q6yklodm6lfjttswr3slfowdrb
  * RefreshQuery -> RefreshFlightEndpoint:
    https://github.com/apache/arrow/issues/35500#issuecomment-1578200076

Background:

Currently, it is undefined whether a client can call DoGet
more than once. Clients may want to retry requests, and
servers may not want to persist a query result forever.

Proposal:

Add an expiration time to FlightEndpoint. If present,
clients may assume they can retry DoGet requests. Otherwise,
clients should avoid retrying DoGet requests.

This proposal is "not" a full retry protocol.

Also, add "pre-defined" actions to Flight RPC for working
with result sets. These are pre-defined Protobuf messages
with standardized encodings for use with DoAction:

  * CancelFlightInfo: Asynchronously cancel the execution of
    a distributed query. (Replaces the equivalent Flight SQL
    action.)
  * RefreshFlightEndpoint: Request an extension of the
    expiration of a FlightEndpoint.
  * CloseFlightInfo: Close a FlightInfo so that the server
    can clean up resources early.

This lets the ADBC/JDBC/ODBC drivers for Flight SQL
explicitly manage result set lifetimes. These can be used
with Flight SQL as regular actions.

Implementation:

https://github.com/apache/arrow/pull/36009 is an
implementation of this proposal. The pull requests has the
followings:

1. Format changes:
   * format/Flight.proto
     
https://github.com/apache/arrow/pull/36009/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba
   * format/FlightSql.proto
     
https://github.com/apache/arrow/pull/36009/files#diff-fd4e5266a841a2b4196aadca76a4563b6770c91d400ee53b6235b96da628a01e

2. Documentation changes:
   docs/source/format/Flight.rst
   
https://github.com/apache/arrow/pull/36009/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89

3. The C++ implementation and an integration test:
   * cpp/src/arrow/flight/

4. The Go implementation and an integration test:
   * go/arrow/flight/
   * go/arrow/internal/flight_integration/

The Java implementation may be added to this pull request.

Next:

I'll start a vote for this proposal after we reach a consensus
on this proposal.

It's the standard process for format change.
See also:
https://arrow.apache.org/docs/dev/format/Changing.html


Thanks,
-- 
kou

Reply via email to