morningman opened a new pull request, #64799:
URL: https://github.com/apache/doris/pull/64799

   ### What problem does this PR solve?
   
   Issue Number: close #62259
   
   Related PR: #64797
   
   Problem Summary:
   
   Arrow Flight SQL queries against Iceberg (and other external) tables in 
batch split mode crashed the BE / failed with `Split source X is released`.
   
   Arrow Flight executes a query in two phases: `GetFlightInfo` (plan + submit 
to BE) and `DoGet` (the client pulls results from the BE). For an external 
table scan in batch split mode, the BE keeps scanning during `DoGet` and lazily 
fetches file splits from the FE via the `fetchSplitBatch` RPC, using an async 
`SplitSource` that the FE coordinator holds (through its scan nodes).
   
   The FE closed the coordinator at the end of `GetFlightInfo` 
(`StmtExecutor.executeAndSendResult`'s `finally` → `Coordinator.close()` → 
`ScanNode.stop()` → `SplitSourceManager.removeSplitSource()`) and also 
unregistered it (`FlightSqlConnectProcessor.close()` → 
`StmtExecutor.finalizeQuery()`). So by the time the BE called `fetchSplitBatch` 
during `DoGet`, the `SplitSource` was already gone. The MySQL protocol is 
unaffected because plan + execute share one request, so the coordinator stays 
alive until all results are consumed.
   
   This PR keeps the coordinator (and its `SplitSource`) alive across the two 
phases and cleans it up reliably:
   
   - **StmtExecutor**: for an Arrow Flight query that produces results on the 
BE (`coordBase == coord`), mark it deferred, register the executor on the 
`ConnectContext`, and skip the eager `Coordinator.close()` in the `finally`. A 
failed query (whose `exec()` threw) is not deferred and is closed as before.
   - **ConnectContext**: hold the deferred executors and add 
`closeFlightSqlDeferredExecutors()`, which closes their coordinators (releasing 
the `SplitSource` and the query queue slot) and unregisters the queries.
   - **FlightSqlConnectProcessor.close()**: do not finalize deferred executors.
   - **DorisFlightSqlProducer**: finalize the previous query's deferred 
coordinator when the next query starts on the connection.
   - **FlightSqlConnectPoolMgr.unregisterConnection()**: finalize deferred 
coordinators when the connection is torn down. All teardown paths (idle/query 
timeout, bearer token expiry, explicit `CloseSession`) reach here, so an 
abandoned connection cannot leak the coordinator.
   
   Non-Arrow-Flight paths (MySQL, internal tables, point queries) are 
unchanged: `deferredForArrowFlight` can only become true for `ARROW_FLIGHT_SQL`.
   
   The BE-side error-path hardening (so any `fetchSplitBatch` failure fails 
gracefully instead of crashing the BE) is handled separately in #64797.
   
   ### Release note
   
   Fix Arrow Flight SQL queries against external tables (e.g. Iceberg) failing 
with `Split source X is released` or crashing the BE in batch split mode.
   
   ### Check List (For Author)
   
   - Test
       - [x] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason
   
   Added 
`regression-test/suites/external_table_p0/iceberg/test_iceberg_arrow_flight_split_source.groovy`.
 It forces batch split mode on the Arrow Flight session 
(`num_files_in_batch_mode=1`), asserts via `explain` that the scan really uses 
the batch `SplitSource` path (`approximate`) so it cannot silently pass on the 
non-batch path, then scans `format_v2.sample_cow_orc` over Arrow Flight and 
checks all rows come back. The test runs in the external (docker) pipeline and 
is skipped when the Iceberg env or the Arrow Flight endpoint is not configured.
   
   - Behavior changed:
       - [x] Yes. An Arrow Flight query's coordinator (and its external-table 
batch `SplitSource`) is now kept alive until the next query starts on the 
connection or the connection is torn down, instead of being closed at the end 
of `GetFlightInfo`.
   
   - Does this need documentation?
       - [x] No.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to