zclllyybb commented on issue #63854:
URL: https://github.com/apache/doris/issues/63854#issuecomment-4564184950

   Breakwater-GitHub-Analysis-Slot: slot_aa996a8b764f
   
   Initial analysis: this looks like a valid, code-backed FE/Nereids planner 
performance bug.
   
   What I checked on current master:
   
   - `PruneOlapScanTablet` builds `selectedTabletIds` by iterating all selected 
partitions and appending the per-partition pruned tablets 
(`fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/PruneOlapScanTablet.java`).
   - `PhysicalPlanTranslator.computePhysicalOlapScan` stores that global list 
as `OlapScanNode.nereidsPrunedTabletIds`.
   - `OlapScanNode.computeTabletInfo()` then loops over `selectedPartitionIds` 
and calls `distributionPrune(..., isNereids && !isPointQuery)` for each 
partition.
   - In the Nereids branch, `distributionPrune()` ignores the current 
partition's `tabletIdsInOrder` and returns `new 
ArrayList<>(nereidsPrunedTabletIds)`. The caller then probes 
`selectedTable.getTablet(id)` twice for every id in that global set. 
`MaterializedIndex.getTablet()` is a HashMap lookup, so ids from other 
partitions still pay the lookup cost before being skipped.
   
   Conclusion:
   
   The reported complexity issue is real on current master. It should normally 
be correctness-neutral because tablets belonging to other partitions are 
skipped by `selectedTable.getTablet(id) != null`, but it makes Nereids FE 
planning cost scale roughly with `selected_partitions * 
globally_pruned_tablets` once tablet pruning has populated `selectedTabletIds`. 
The many-partitions/few-buckets workload described in the issue is the bad case 
for this shape.
   
   This also matches the referenced #53403 change: that change avoids 
re-running hash distribution pruning in `computeTabletInfo()`, but the 
replacement global `nereidsPrunedTabletIds` set is consumed inside a 
per-partition loop without slicing it back to partition-local tablets.
   
   Suggested fix direction:
   
   - In the Nereids path, make `distributionPrune()` return only the tablets 
that belong to the current partition, for example by intersecting the current 
partition's `tabletIdsInOrder` with `nereidsPrunedTabletIds`, or by carrying 
selected tablets in a partition-keyed structure before translation.
   - Avoid the double `selectedTable.getTablet(id)` lookup by storing the 
lookup result once before the null check.
   - Add a regression test that covers multiple selected partitions with a 
global Nereids-selected tablet set and verifies that each partition only 
walks/returns its own tablets. A small performance-oriented FE planner test 
with many partitions and few buckets would be useful because the current 
behavior is functionally correct but asymptotically wrong.
   
   Missing information that would still help maintainers:
   
   - Exact affected build/tag for `4.x` (`master`, a specific 4.0/4.1 tag, or a 
cloud tag) so backport scope can be decided.
   - If available, attach the concrete generated DDL and FE profile or 
async-profiler output. The code path is already enough to confirm the bug, but 
those artifacts would help quantify the impact and validate the fix.
   
   Recommended next step: treat this as a Nereids planner performance 
regression introduced by the #53403 fix path, and review a focused patch in 
`OlapScanNode.distributionPrune()` plus the regression coverage above.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to