Copilot commented on code in PR #6553:
URL: https://github.com/apache/hive/pull/6553#discussion_r3463940510
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -160,6 +159,33 @@ public Double visitCall(RexCall call) {
break;
}
+ case IS_NULL: {
+ if (childRel instanceof HiveTableScan) {
+ HiveTableScan hiveTableScan = (HiveTableScan) childRel;
+ if (hasMissingColumnStats(call, hiveTableScan)) {
+ selectivity = DEFAULT_COMPARISON_SELECTIVITY;
+ break;
+ }
+ double noOfNulls = getMaxNulls(call, hiveTableScan);
+ if (childCardinality >= noOfNulls) {
Review Comment:
`IS_NULL` handling calls `getColStat(...)` twice (via
`hasMissingColumnStats` and again via `getMaxNulls`). In
`RelOptHiveTable#getColStat` this can trigger repeated stats collection/logging
and potentially extra metastore work. Consider fetching column stats once, then
(a) validate missing/estimated/null stats and (b) compute max nulls from that
same list to avoid duplicate lookups.
##########
ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java:
##########
@@ -1116,6 +1116,32 @@ public void testBetweenWithCastToDecimal7s1() {
checkBetweenSelectivity(0, universe, total, cast, 100f, 0f);
}
+ @Test
+ public void testComputeIsNullSelectivityWithStats() {
+ stats.setNumNulls(3);
+
doReturn(Collections.singletonList(stats)).when(tableMock).getColStat(Collections.singletonList(0));
+ RexNode filter = REX_BUILDER.makeCall(SqlStdOperatorTable.IS_NULL,
inputRef0);
+ checkSelectivity(3f / VALUES.length, filter); // 3 / 13
+ }
+
+ @Test
+ public void testComputeIsNullSelectivityMissingStats() {
+
doReturn(Collections.emptyList()).when(tableMock).getColStat(Collections.singletonList(0));
+ RexNode filter = REX_BUILDER.makeCall(SqlStdOperatorTable.IS_NULL,
inputRef0);
+ checkSelectivity(1f / 3f, filter); // DEFAULT_COMPARISON_SELECTIVITY
+ }
+
+ @Test
+ public void testComputeIsNullSelectivityEstimatedStatsFallback() {
+ ColStatistics estimated = new ColStatistics();
+ estimated.setIsEstimated(true);
+ estimated.setNumNulls(0);
+
doReturn(Collections.singletonList(estimated)).when(tableMock).getColStat(Collections.singletonList(0));
+ RexNode filter = REX_BUILDER.makeCall(SqlStdOperatorTable.IS_NULL,
inputRef0);
+ // estimated stats should be treated as missing and fallback to
DEFAULT_COMPARISON_SELECTIVITY
+ checkSelectivity(1f / 3f, filter);
+ }
Review Comment:
`hasMissingColumnStats` explicitly treats `null` `ColStatistics` entries as
missing, but the tests only cover the empty-list and estimated-stats cases.
Adding a unit test where `getColStat` returns a list containing a `null`
element would lock in the intended fallback behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]