github-actions[bot] commented on code in PR #64559:
URL: https://github.com/apache/doris/pull/64559#discussion_r3487837899


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -63,6 +74,22 @@ private static EqualPredicate 
normalizeEqualPredJoinCondition(EqualPredicate equ
         }
     }
 
+    static boolean hasTrustableEqualCondition(Statistics leftStats, Statistics 
rightStats, Join join) {
+        if (join.getEqualPredicates().isEmpty()) {
+            return false;
+        }
+        double rightStatsRowCount = 
StatsMathUtil.nonZeroDivisor(rightStats.getRowCount());
+        double leftStatsRowCount = 
StatsMathUtil.nonZeroDivisor(leftStats.getRowCount());
+        return join.getEqualPredicates().stream()
+                .map(expression -> 
normalizeEqualPredJoinCondition((EqualPredicate) expression, rightStats))
+                .anyMatch(equal -> {
+                    ColumnStatistic eqLeftColStats = 
ExpressionEstimation.estimate(equal.left(), leftStats);
+                    ColumnStatistic eqRightColStats = 
ExpressionEstimation.estimate(equal.right(), rightStats);
+                    return eqRightColStats.ndv / rightStatsRowCount > 
TRUSTABLE_UNIQ_THRESHOLD

Review Comment:
   `hasTrustableEqualCondition()` should reject unknown column stats before 
applying the NDV ratio. `ExpressionEstimation.visitSlotReference()` returns 
`ColumnStatistic.UNKNOWN` when a slot has no stats, and unknown stats are built 
with `ndv=1` and `isUnKnown=true`. For a DPHyp group such as:
   
   ```text
   Group{A,B}
     LogicalJoin(A.k = B.k)
       A stats rowCount=1, A.k=UNKNOWN
       B stats rowCount=N, B.k=UNKNOWN
   ```
   
   this helper evaluates `1 / nonZeroDivisor(1) > 0.9`, so 
`MemoStatsAndCostRecomputer.isTrustJoin()` gives the candidate a trust-join 
point even though `StatsCalculator` would mark the expression unreliable for 
unknown input slots. With 
`memo_logical_row_count_aggregation_policy=trust_join_count`, 
`filterCandidateStatisticsByPolicy()` can then prefer a candidate because its 
unknown equality was counted as trusted. Please check 
`!eqLeftColStats.isUnKnown && !eqRightColStats.isUnKnown` (or reuse the 
existing unknown-condition guard) before treating the equality as trustable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to