[
https://issues.apache.org/jira/browse/CALCITE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882641#comment-17882641
]
Ruben Q L commented on CALCITE-6203:
------------------------------------
Thanks [~nobigo] for the clarification. I was just about to add that this
conversion of AGG => SUM/COUNT is quite common, it appears also in
AggregateReduceFunctionsRule.
[~caicancai], I agree with [~nobigo], if there are no further comments, I think
we should close this ticket. If you think that Calcite test based is missing
some tests around AVG, I'd propose to create a separate ticket (task, not bug),
and work on that for a future release (more test coverage is always welcome :)
> Avg operator test in SparkAdapterTest
> -------------------------------------
>
> Key: CALCITE-6203
> URL: https://issues.apache.org/jira/browse/CALCITE-6203
> Project: Calcite
> Issue Type: Bug
> Components: spark
> Affects Versions: 1.36.0
> Reporter: Caican Cai
> Priority: Minor
> Fix For: 1.38.0
>
> Attachments: 2024-01-15 22-42-23屏幕截图.png
>
>
> {code:java}
> {code}
> I test the avg operator in SparkAdapterTest
> {code:java}
> @Test void testGroupBy() {
> final String sql = "select sum(x) as SUM_X, min(y) as MIN_Y, max(y) as
> MAX_Y, avg(x) as AVG_X,"
> + "count(*) as CNT_Y, count(distinct y) as CNT_DIST_Y\n"
> + "from " + VALUES2 + "\n"
> + "group by x";
> final String plan = "PLAN="
> + "EnumerableCalc(expr#0..5=[{inputs}], expr#6=[CAST($t1):INTEGER NOT
> NULL], expr#7=[CAST($t2):CHAR(1) NOT NULL], expr#8=[CAST($t3):CHAR(1) NOT
> NULL], expr#9=[CAST($t4):BIGINT NOT NULL], SUM_X=[$t6], MIN_Y=[$t7],
> MAX_Y=[$t8], AVG_X=[$t0], CNT_Y=[$t9], CNT_DIST_Y=[$t5])\n"
> + " EnumerableAggregate(group=[{0}], SUM_X=[MIN($2) FILTER $7],
> MIN_Y=[MIN($3) FILTER $7], MAX_Y=[MIN($4) FILTER $7], CNT_Y=[MIN($5) FILTER
> $7], CNT_DIST_Y=[COUNT($1) FILTER $6])\n"
> + " EnumerableCalc(expr#0..6=[{inputs}], expr#7=[0], expr#8=[=($t6,
> $t7)], expr#9=[1], expr#10=[=($t6, $t9)], proj#0..5=[{exprs}], $g_0=[$t8],
> $g_1=[$t10])\n"
> + " EnumerableAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}]],
> SUM_X=[$SUM0($0)], MIN_Y=[MIN($1)], MAX_Y=[MAX($1)], CNT_Y=[COUNT()],
> $g=[GROUPING($0, $1)])\n"
> + " EnumerableValues(tuples=[[{ 1, 'a' }, { 2, 'b' }, { 1, 'b'
> }, { 2, 'c' }, { 2, 'c' }]])\n";
> final String[] expectedResult = {
> "SUM_X=2; MIN_Y=a; MAX_Y=b; AVG_X=1; CNT_Y=2; CNT_DIST_Y=2",
> "SUM_X=6; MIN_Y=b; MAX_Y=c; AVG_X=2; CNT_Y=3; CNT_DIST_Y=2"
> };
> sql(sql).returnsUnordered(expectedResult)
> .explainContains(plan);
> } {code}
>
> It can be found that there is no corresponding avg operator in
> EnumerableAggregate and EnumerableCalc. Why is this? Spark sql will split the
> avg operator into sum/count form. I don’t know if this is a bug.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)