github-actions[bot] commented on code in PR #64559:
URL: https://github.com/apache/doris/pull/64559#discussion_r3488926694
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/cost/CostModel.java:
##########
@@ -567,7 +567,7 @@ public Cost visitPhysicalNestedLoopJoin(
nljPenalty = Math.min(leftStatistics.getRowCount(),
rightStatistics.getRowCount());
}
return Cost.of(context.getSessionVariable(),
- leftStatistics.getRowCount() * rightStatistics.getRowCount(),
+ leftStatistics.getRowCount() * rightStatistics.getRowCount() *
nljPenalty,
Review Comment:
`nljPenalty` can be below `1.0`, so applying it to the CPU term can make an
NLJ cheaper than it was before this PR instead of penalizing it. For example,
row-count estimates are not restricted to integers: `Statistics.withSel()`
returns `rowCount * selectivity` directly, and filter estimation can produce
`0.5` rows from a one-row input with an unknown equality predicate. With left
rows `0.5` and right rows `1000`, this branch sets `nljPenalty = 0.5`, and the
new CPU cost becomes `0.5 * 1000 * 0.5 = 250` instead of the previous `500`.
That is the opposite of the comment's goal of discouraging bad NLJ choices.
Please clamp the multiplier used for CPU to at least `1.0` (for example
`Math.max(1.0, nljPenalty)`) or otherwise skip this penalty when the selected
row-count estimate is below one row.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]