Hi,

I would like to propose long-running queries support for
Flight RPC.

See the following pull request and discussion for details:

* GH-36155: [C++][Go][Java][FlightRPC] Add support for long-running queries
  https://github.com/apache/arrow/pull/36946

* [DISCUSS][Format][Flight] Long-running queries support
  https://lists.apache.org/thread/qcjpcw6m3p15wqxp6n6rqzlx01v1fl3v

This is based on one of the following proposals:

  [DISCUSS] Flight RPC/Flight SQL/ADBC enhancements
  https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp

  Google Docs: (Arrow ML) Arrow Flight RPC/Flight SQL Proposals
  
https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit#heading=h.anpr1q5slm1v

Summary:

* Background: Queries generally don't complete instantly (as
  much as we would like them to). So where can we put the
  'query evaluation time'?

  * In GetFlightInfo: block and wait for the query to complete.
    * Con: this is a long-running blocking call, which may
      fail or time out. Then when the client retries, the
      server has to redo all the work.
    * Con: parts of the result may be ready before others, but
      the client can't do anything until everything is ready.

  * In DoGet: return a fixed number of partitions
    * Con: this makes handling worker failures hard. Systems
      like Trino support fault-tolerant execution by replacing
      workers at runtime. But GetFlightInfo has already
      passed, so we can't notify the client of new workers.
    * Con: we have to know or fix the partitioning up front.

  Neither solution is optimal.

* Proposal: Add PollFlightInfo as a pollable version of
  GetFlightInfo. Clients can poll the current query status
  and start reading the currently available results so far
  before the query is completed.

* Changes:

  * Add PollFlightInfo and PollInfo

    Flight.proto:
      
https://github.com/apache/arrow/pull/36946/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba
    Documentation:
      
http://crossbow.voltrondata.com/pr_docs/36946/format/Flight.html#downloading-data-by-running-a-heavy-query

  * The pull request includes reference implementations for
    C++, Go and Java.


The vote will be open for at least 72 hours.

[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...


Thanks,
-- 
kou

Reply via email to