zclllyybb commented on issue #64368:
URL: https://github.com/apache/doris/issues/64368#issuecomment-4668362971

   Breakwater-GitHub-Analysis-Slot: slot_e2a93862adf5
   
   Initial triage: this looks very likely related to 
https://github.com/apache/doris/pull/64363, not to the `ARRAY<BIGINT>` 
expression or recursive CTE SQL syntax itself.
   
   The issue does not include an actual Doris version; the `Version` field 
contains the CTAS SQL. I checked current upstream `master` (`e8c06f265a2`) and 
the public PR above. The exact error text is thrown by 
`Coordinator.findMaxParallelFragmentIndex()` when the legacy `Coordinator` sees 
a fragment whose leftmost node is not a `ScanNode` but the fragment has no 
child fragments. Recursive CTE planning is expected to go through the Nereids 
distributed planner / `NereidsCoordinator` path; current recursive CTE analysis 
also requires `enable_nereids_distribute_planner=true`.
   
   PR #64363 describes a very close failure mode: in a multi-FE proxy flow, 
`parseByNereids()` did not propagate the parsed statement into 
`ConnectContext.getStatementContext()`. Then 
`SessionVariable.canUseNereidsDistributePlanner()` returned false, Nereids 
distributed plans were skipped, `EnvFactory.createCoordinator()` chose the 
legacy `Coordinator`, and CTAS with recursive CTE failed with `fragment has no 
children`.
   
   Why this issue matches that path:
   
   1. Same error text: `fragment has no children`.
   2. Same statement class: CTAS / INSERT using `WITH RECURSIVE`.
   3. PR #64363 is still open/draft, so the fix may not be in the reporter's 
build.
   
   Missing information needed to confirm it:
   
   1. Actual Doris version, commit hash, or build package.
   2. Whether the SQL was executed through a follower FE / proxy path or 
directly on the master FE.
   3. Values of `enable_nereids_planner`, `enable_nereids_distribute_planner`, 
and `enable_fallback_to_original_planner`.
   4. The FE stack trace around the failed statement.
   5. Whether the same SQL succeeds after applying PR #64363, or when executed 
directly on the master FE.
   
   Suggested next steps:
   
   1. If this was run through a multi-FE proxy path, first verify with PR 
#64363. If it fixes the case, this issue can be linked to that PR.
   2. Add a regression case with the exact CTAS and explicit-schema `INSERT 
INTO` from this issue, not only plain `SELECT WITH RECURSIVE`.
   3. If the issue still reproduces on a single FE or after PR #64363, then 
investigate the recursive CTE DML sink path separately. The next suspicious 
area would be fragment/coordinator assignment for recursive CTE plans without a 
base-table scan.
   
   I would not recommend disabling Nereids as a reliable workaround here. 
Recursive CTE is a Nereids path, and current code requires the Nereids 
distributed planner for recursive CTE analysis. A safer temporary workaround is 
to run against a build containing PR #64363 or execute directly on the master 
FE if this is confirmed to be the proxy-path case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to