924060929 commented on PR #63366: URL: https://github.com/apache/doris/pull/63366#issuecomment-4668765617
Thanks for the thorough review! Addressed in 089459fbaae (review fixes) + 7c898c90050 (num_tasks). Verified against the local-shuffle regression (`test_local_shuffle_rqg_bugs`: 25 bugs, FE==BE, no crash) + the FE planner unit tests; `force_passthrough` additionally checked on a 4-BE cluster. ### @morrySnow - **[P1] throw instead of log (AddLocalExchange):** restored to `throw` — this was originally an assert, downgraded to a warning while debugging. Verified it never misfires across the regression + unit tests. - **[P2] make the regression fail the suite (`test_local_shuffle_rqg_bugs`):** Bug 3 now asserts FE==BE (row count + order-insensitive content); Bug 4/5 raise `assertTrue(false)` on an unexpected exception instead of only logging. (Bug 1/2/6/7/8 already asserted in their catch.) - **[P2] preserve `enable_broadcast_join_force_passthrough` (HashJoinNode):** now mirrored. Investigated on a 4-BE cluster with a forced non-serial probe (`experimental_ignore_storage_data_distribution=false`): it is a **no-op** there — `enforceRequire` only inserts a PASSTHROUGH local exchange to fan a *serial* (1-task) source out to N tasks; an already-N-task source satisfies passthrough, so no exchange is added (identical plan and results vs BE-native, no crash). The check is kept to match BE's intent; a true rebalance of a non-serial probe (forcing the exchange) is a perf-only follow-up. - **🟡 `resolveExchangeType` unused `child` param:** removed all three unused params (`translatorContext`/`parent`/`child`) — only `require` is read. Updated all 6 call sites + the unit test (also resolves the SetOperationNode null-`firstChild` note). - **🟠 `_propagate_local_exchange_num_tasks` convergence loops:** replaced the two iterative `while (changed)` passes with a single Kahn topological sweep over the pipeline DAG — each pipeline is visited once, after all its upstreams, so num_tasks is set in one pass. It can't loop infinitely; added a cycle `DCHECK` instead of a max-iteration guard. (Commit 7c898c90050.) - **🟡 build-side comment (HashJoinNode):** added a note that both probe and build sides require `BUCKET_HASH_SHUFFLE`. - **🟢 `_operator_id` starting from -1 / ✅ AggregationNode correctness:** thanks — acknowledged, no change needed. ### @BiteTheDDDDt - **`local_exchange_sink_operator` redundancy with `init_partitioner`:** extracted `_create_partitioner(state, bucket_count)`, now shared by `init()` (BE-native) and `init_partitioner()` (FE-planned) — removes the three duplicated hash-partitioner blocks. - **BE computing `required_data_distribution` / parallelism:** retained for the **BE-native fallback** path (`enable_local_shuffle_planner=false`). The FE-planned path doesn't read it, but keeping it preserves the BE-native planner as a fallback option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
