Copilot commented on code in PR #12294:
URL: https://github.com/apache/gluten/pull/12294#discussion_r3457350250
##########
cpp-ch/local-engine/Parser/aggregate_function_parser/SimpleStatisticsFunctions.cpp:
##########
@@ -63,7 +63,14 @@ class AggregateFunctionParserStddev final : public
AggregateFunctionParser
return func_node;
}
};
+// for skewness
Review Comment:
The registration reuses `AggregateFunctionParserStddev` for skewness. Since
the class name suggests it’s stddev-specific, it’d help future readers to
document (at the call site) that this is intentional and primarily provides the
NaN→NULL conversion needed for Spark compatibility (count < 3).
##########
backends-clickhouse/src/test/scala/org/apache/gluten/execution/GlutenClickhouseCountDistinctSuite.scala:
##########
@@ -100,12 +100,17 @@ class GlutenClickhouseCountDistinctSuite extends
GlutenClickHouseWholeStageTrans
}
test("check count distinct with agg fallback") {
- // skewness agg is not supported, will cause fallback
val sql = "select count(distinct(a,b)) , skewness(b) from " +
"values (0, null,1), (0,null,1), (1, 1,1), (2, 2, 1) ,(2,2,2),(3,3,3) as
data(a,b,c)"
- assertThrows[UnsupportedOperationException] {
- spark.sql(sql).show
- }
+ val df = spark.sql(sql)
+ GlutenQueryComparisonTest.checkFallBack(df, noFallback = true)
Review Comment:
The new skewness coverage here doesn’t exercise the key behavioral change
described in the PR (ClickHouse `skewSamp` returning NaN when there are fewer
than 3 non-null inputs, and Gluten converting that NaN to NULL to match Spark).
With the current data (4 non-null `b` values), this can pass even if NaN→NULL
handling is broken.
Consider adding a small query (and a grouped variant) where each aggregation
sees < 3 non-null values, and also rename the test since it’s no longer
asserting fallback.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]