Re: [I] [Bug](nereids) distributionPrune doesn't slice nereidsPrunedTabletIds per partition [doris]

via GitHub Thu, 28 May 2026 05:58:10 -0700


zclllyybb commented on issue #63854:
URL: https://github.com/apache/doris/issues/63854#issuecomment-4564184950

Breakwater-GitHub-Analysis-Slot: slot_aa996a8b764f

Initial analysis: this looks like a valid, code-backed FE/Nereids planner
performance bug.

What I checked on current master:

- `PruneOlapScanTablet` builds `selectedTabletIds` by iterating all selected
partitions and appending the per-partition pruned tablets
(`fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/PruneOlapScanTablet.java`).
- `PhysicalPlanTranslator.computePhysicalOlapScan` stores that global list
as `OlapScanNode.nereidsPrunedTabletIds`.
- `OlapScanNode.computeTabletInfo()` then loops over `selectedPartitionIds`
and calls `distributionPrune(..., isNereids && !isPointQuery)` for each
partition.
- In the Nereids branch, `distributionPrune()` ignores the current
partition's `tabletIdsInOrder` and returns `new
ArrayList<>(nereidsPrunedTabletIds)`. The caller then probes
`selectedTable.getTablet(id)` twice for every id in that global set.
`MaterializedIndex.getTablet()` is a HashMap lookup, so ids from other
partitions still pay the lookup cost before being skipped.

Conclusion:

The reported complexity issue is real on current master. It should normally
be correctness-neutral because tablets belonging to other partitions are
skipped by `selectedTable.getTablet(id) != null`, but it makes Nereids FE
planning cost scale roughly with `selected_partitions *
globally_pruned_tablets` once tablet pruning has populated `selectedTabletIds`.
The many-partitions/few-buckets workload described in the issue is the bad case
for this shape.

This also matches the referenced #53403 change: that change avoids
re-running hash distribution pruning in `computeTabletInfo()`, but the
replacement global `nereidsPrunedTabletIds` set is consumed inside a
per-partition loop without slicing it back to partition-local tablets.

Suggested fix direction:

- In the Nereids path, make `distributionPrune()` return only the tablets
that belong to the current partition, for example by intersecting the current
partition's `tabletIdsInOrder` with `nereidsPrunedTabletIds`, or by carrying
selected tablets in a partition-keyed structure before translation.
- Avoid the double `selectedTable.getTablet(id)` lookup by storing the
lookup result once before the null check.
- Add a regression test that covers multiple selected partitions with a
global Nereids-selected tablet set and verifies that each partition only
walks/returns its own tablets. A small performance-oriented FE planner test
with many partitions and few buckets would be useful because the current
behavior is functionally correct but asymptotically wrong.

Missing information that would still help maintainers:

- Exact affected build/tag for `4.x` (`master`, a specific 4.0/4.1 tag, or a
cloud tag) so backport scope can be decided.
- If available, attach the concrete generated DDL and FE profile or
async-profiler output. The code path is already enough to confirm the bug, but
those artifacts would help quantify the impact and validate the fix.

Recommended next step: treat this as a Nereids planner performance
regression introduced by the #53403 fix path, and review a focused patch in
`OlapScanNode.distributionPrune()` plus the regression coverage above.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Bug](nereids) distributionPrune doesn't slice nereidsPrunedTabletIds per partition [doris]

Reply via email to