morningman commented on code in PR #64799:
URL: https://github.com/apache/doris/pull/64799#discussion_r3487272034


##########
fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/DorisFlightSqlProducer.java:
##########
@@ -189,6 +189,10 @@ private FlightInfo executeQueryStatement(String 
peerIdentity, ConnectContext con
         try {
             Preconditions.checkState(null != connectContext);
             Preconditions.checkState(!query.isEmpty());
+            // Finalize the previous query's coordinator on this connection 
whose close was
+            // deferred (Arrow Flight keeps it alive across GetFlightInfo -> 
DoGet so the BE can
+            // fetch external-table splits during DoGet). By now the previous 
DoGet is done. #62259
+            connectContext.closeFlightSqlDeferredExecutors();
             // After the previous query was executed, there was no 
getStreamStatement to take away the result.

Review Comment:
   I looked into this and don't think it's a regression this PR introduces, nor 
something that can be properly fixed within this PR's scope. Details:
   
   **Not a new failure mode / not a regression.** Before this PR, *every* 
external-table batch-split query over Arrow Flight already failed with `Split 
source X is released`, because the coordinator was closed at the end of 
`GetFlightInfo` — sequential queries included (that is exactly #62259). After 
this PR the sequential case works; the interleaved case you describe fails as 
it did before. So nothing is "reintroduced" — interleaving never worked.
   
   **Interleaving two queries on one session is not a supported usage.** A 
bearer token maps to a single shared `ConnectContext` 
(`FlightSessionsWithTokenManager`), and that context is 
single-query-by-construction: each `GetFlightInfo` resets the shared 
`FlightSqlChannel`, clears the single `flightSqlEndpointsLocations` list, and 
reuses one `queryId`. Two concurrent/interleaved queries on one session corrupt 
all of that, not just the deferred coordinator — the same way sharing one JDBC 
`Connection` across threads is unsafe. The supported pattern (drain the result, 
then issue the next query) makes "the previous DoGet is done by the time the 
next query starts" hold, since the BE has consumed the SplitSource by the time 
the client reaches end-of-stream.
   
   **The suggested fix needs a signal the FE does not have.** For an 
external-table scan the result `DoGet` endpoint points directly at the BE; the 
FE is not in that data path and gets no completion signal when the BE-side 
stream finishes. "Keep the coordinator until the result stream completes" or 
"serialize/reject a new statement while a result is live" both require a new BE 
-> FE completion notification (or session-level serialization keyed on it), 
which is a separate architectural change out of scope here.
   
   **Leaks are already bounded.** The "or the session is closed" fallback is 
implemented: `FlightSqlConnectPoolMgr.unregisterConnection()` -> 
`closeFlightSqlDeferredExecutors()` covers idle/query timeout, bearer-token 
expiry and explicit `CloseSession`. And the BE-side hardening in #64797 makes a 
stale `fetchSplitBatch` fail gracefully instead of crashing the BE.
   
   If first-class concurrent Flight clients are ever needed, the right 
follow-up is a BE -> FE completion signal plus session serialization; I can 
track that separately.
   



##########
fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/FlightSqlConnectProcessor.java:
##########
@@ -196,11 +196,20 @@ public void fetchArrowFlightSchema(int timeoutMs) {
     @Override
     public void close() throws Exception {
         ctx.setCommand(MysqlCommand.COM_SLEEP);
+        // Executors whose results are pulled from the BE keep their 
coordinator alive past
+        // GetFlightInfo (registered as deferred executors on the 
ConnectContext) so the BE can
+        // still fetch external-table splits during DoGet. Do NOT finalize 
those here; they are
+        // finalized when the next query starts or the connection is torn 
down. Executors that are
+        // not deferred (local results, or a query that already failed) are 
finalized now. See #62259.
         for (StmtExecutor asynExecutor : returnResultFromRemoteExecutor) {
-            asynExecutor.finalizeQuery();
+            if (!asynExecutor.isDeferredForArrowFlight()) {
+                asynExecutor.finalizeQuery();
+            }

Review Comment:
   Confirmed — this is a real (bounded) leak. If the Arrow schema fetch in 
`executeQueryStatement` fails after the coordinator was already deferred during 
planning, `FlightSqlConnectProcessor.close()` skips the deferred executor and 
the outer catch only rethrows, so the coordinator (its external-table batch 
SplitSource, the query queue slot and the query registration) stays alive until 
the next query starts or the connection is torn down — even though no `DoGet` 
will ever pull this query's results.
   
   Fixed in 41a79a659df by finalizing the deferred coordinator on the 
`GetFlightInfo` error path: `executeQueryStatement`'s catch now calls 
`connectContext.closeFlightSqlDeferredExecutors()`. At that point the list 
holds only this failed query's coordinator (the previous one was already 
finalized at the top of the method), and it covers every post-deferral failure, 
not just the schema fetch.
   
   Added 
`DorisFlightSqlProducerTest.testGetFlightInfoFinalizesDeferredExecutorWhenSchemaFetchFails`,
 which defers a coordinator then fails the schema fetch and asserts the 
deferred executor is finalized (verified to fail without the fix).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to