zclllyybb commented on issue #64368: URL: https://github.com/apache/doris/issues/64368#issuecomment-4668362971
Breakwater-GitHub-Analysis-Slot: slot_e2a93862adf5 Initial triage: this looks very likely related to https://github.com/apache/doris/pull/64363, not to the `ARRAY<BIGINT>` expression or recursive CTE SQL syntax itself. The issue does not include an actual Doris version; the `Version` field contains the CTAS SQL. I checked current upstream `master` (`e8c06f265a2`) and the public PR above. The exact error text is thrown by `Coordinator.findMaxParallelFragmentIndex()` when the legacy `Coordinator` sees a fragment whose leftmost node is not a `ScanNode` but the fragment has no child fragments. Recursive CTE planning is expected to go through the Nereids distributed planner / `NereidsCoordinator` path; current recursive CTE analysis also requires `enable_nereids_distribute_planner=true`. PR #64363 describes a very close failure mode: in a multi-FE proxy flow, `parseByNereids()` did not propagate the parsed statement into `ConnectContext.getStatementContext()`. Then `SessionVariable.canUseNereidsDistributePlanner()` returned false, Nereids distributed plans were skipped, `EnvFactory.createCoordinator()` chose the legacy `Coordinator`, and CTAS with recursive CTE failed with `fragment has no children`. Why this issue matches that path: 1. Same error text: `fragment has no children`. 2. Same statement class: CTAS / INSERT using `WITH RECURSIVE`. 3. PR #64363 is still open/draft, so the fix may not be in the reporter's build. Missing information needed to confirm it: 1. Actual Doris version, commit hash, or build package. 2. Whether the SQL was executed through a follower FE / proxy path or directly on the master FE. 3. Values of `enable_nereids_planner`, `enable_nereids_distribute_planner`, and `enable_fallback_to_original_planner`. 4. The FE stack trace around the failed statement. 5. Whether the same SQL succeeds after applying PR #64363, or when executed directly on the master FE. Suggested next steps: 1. If this was run through a multi-FE proxy path, first verify with PR #64363. If it fixes the case, this issue can be linked to that PR. 2. Add a regression case with the exact CTAS and explicit-schema `INSERT INTO` from this issue, not only plain `SELECT WITH RECURSIVE`. 3. If the issue still reproduces on a single FE or after PR #64363, then investigate the recursive CTE DML sink path separately. The next suspicious area would be fragment/coordinator assignment for recursive CTE plans without a base-table scan. I would not recommend disabling Nereids as a reliable workaround here. Recursive CTE is a Nereids path, and current code requires the Nereids distributed planner for recursive CTE analysis. A safer temporary workaround is to run against a build containing PR #64363 or execute directly on the master FE if this is confirmed to be the proxy-path case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
