[
https://issues.apache.org/jira/browse/CALCITE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Caican Cai updated CALCITE-6203:
--------------------------------
Description:
I test the avg operator in SparkAdapterTest
sql test:
{color:#cc7832}final {color}String sql = {color:#6a8759}"select sum(x) as
SUM_X, min(y) as MIN_Y, max(y) as MAX_Y, avg(x) as
AVG_X,"{color}{color:#6a8759} {color}+ {color:#6a8759}"count(*) as CNT_Y,
count(distinct y) as
CNT_DIST_Y{color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
{color}+ {color:#6a8759}"from " {color}+ {color:#9876aa}VALUES2 {color}+
{color:#6a8759}"{color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
{color}+ {color:#6a8759}"group by x"{color}{color:#cc7832};{color}
{color:#cc7832}result:{color}
{color:#cc7832}final {color}String plan =
{color:#6a8759}"PLAN="{color}{color:#6a8759} {color}+
{color:#6a8759}"EnumerableCalc(expr#0..5=[\\{inputs}],
expr#6=[CAST($t1):INTEGER NOT NULL], expr#7=[CAST($t2):CHAR(1) NOT NULL],
expr#8=[CAST($t3):CHAR(1) NOT NULL], expr#9=[CAST($t4):BIGINT NOT NULL],
SUM_X=[$t6], MIN_Y=[$t7], MAX_Y=[$t8], AVG_X=[$t0], CNT_Y=[$t9],
CNT_DIST_Y=[$t5]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
{color}+ {color:#6a8759}" EnumerableAggregate(group=[\\{0}], SUM_X=[MIN($2)
FILTER $7], MIN_Y=[MIN($3) FILTER $7], MAX_Y=[MIN($4) FILTER $7],
CNT_Y=[MIN($5) FILTER $7], CNT_DIST_Y=[COUNT($1) FILTER
$6]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
{color}+ {color:#6a8759}" EnumerableCalc(expr#0..6=[\\{inputs}], expr#7=[0],
expr#8=[=($t6, $t7)], expr#9=[1], expr#10=[=($t6, $t9)], proj#0..5=[\\{exprs}],
$g_0=[$t8],
$g_1=[$t10]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
{color}+ {color:#6a8759}" EnumerableAggregate(group=[\\{0, 1}], groups=[[\\{0,
1}, \\{0}]], SUM_X=[$SUM0($0)], MIN_Y=[MIN($1)], MAX_Y=[MAX($1)],
CNT_Y=[COUNT()], $g=[GROUPING($0,
$1)]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
{color}+ {color:#6a8759}" EnumerableValues(tuples=[[\\{ 1, 'a' }, \\{ 2, 'b' },
\\{ 1, 'b' }, \\{ 2, 'c' }, \\{ 2, 'c'
}]]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#cc7832};{color}
It can be found that there is no corresponding avg operator in
EnumerableAggregate and EnumerableCalc. Why is this? Spark sql will split the
avg operator into sum/count form. I don’t know if this is a bug.
was:
I test the avg operator in SparkAdapterTest
sql test:
{color:#cc7832}final {color}String sql = {color:#6a8759}"select sum(x) as
SUM_X, min(y) as MIN_Y, max(y) as MAX_Y, avg(x) as AVG_X,"
{color}{color:#6a8759} {color}+ {color:#6a8759}"count(*) as CNT_Y,
count(distinct y) as CNT_DIST_Y{color}{color:#cc7832}\n{color}{color:#6a8759}"
{color}{color:#6a8759} {color}+ {color:#6a8759}"from " {color}+
{color:#9876aa}VALUES2 {color}+
{color:#6a8759}"{color}{color:#cc7832}\n{color}{color:#6a8759}"
{color}{color:#6a8759} {color}+ {color:#6a8759}"group by
x"{color}{color:#cc7832};{color}
{color:#cc7832}result:{color}
{color:#cc7832}final {color}String plan = {color:#6a8759}"PLAN="
{color}{color:#6a8759} {color}+
{color:#6a8759}"EnumerableCalc(expr#0..5=[\{inputs}], expr#6=[CAST($t1):INTEGER
NOT NULL], expr#7=[CAST($t2):CHAR(1) NOT NULL], expr#8=[CAST($t3):CHAR(1) NOT
NULL], expr#9=[CAST($t4):BIGINT NOT NULL], SUM_X=[$t6], MIN_Y=[$t7],
MAX_Y=[$t8], AVG_X=[$t0], CNT_Y=[$t9],
CNT_DIST_Y=[$t5]){color}{color:#cc7832}\n{color}{color:#6a8759}"
{color}{color:#6a8759} {color}+ {color:#6a8759}"
EnumerableAggregate(group=[\{0}], SUM_X=[MIN($2) FILTER $7], MIN_Y=[MIN($3)
FILTER $7], MAX_Y=[MIN($4) FILTER $7], CNT_Y=[MIN($5) FILTER $7],
CNT_DIST_Y=[COUNT($1) FILTER $6]){color}{color:#cc7832}\n{color}{color:#6a8759}"
{color}{color:#6a8759} {color}+ {color:#6a8759}"
EnumerableCalc(expr#0..6=[\{inputs}], expr#7=[0], expr#8=[=($t6, $t7)],
expr#9=[1], expr#10=[=($t6, $t9)], proj#0..5=[\{exprs}], $g_0=[$t8],
$g_1=[$t10]){color}{color:#cc7832}\n{color}{color:#6a8759}"
{color}{color:#6a8759} {color}+ {color:#6a8759}"
EnumerableAggregate(group=[\{0, 1}], groups=[[\{0, 1}, \{0}]],
SUM_X=[$SUM0($0)], MIN_Y=[MIN($1)], MAX_Y=[MAX($1)], CNT_Y=[COUNT()],
$g=[GROUPING($0, $1)]){color}{color:#cc7832}\n{color}{color:#6a8759}"
{color}{color:#6a8759} {color}+ {color:#6a8759}" EnumerableValues(tuples=[[\{
1, 'a' }, \{ 2, 'b' }, \{ 1, 'b' }, \{ 2, 'c' }, \{ 2, 'c'
}]]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#cc7832};
{color}
It can be found that there is no corresponding avg operator in
EnumerableAggregate and EnumerableCalc. Why is this? Spark sql will split the
avg operator into sum/count form. I don’t know if this is a bug.
> Avg operator test in SparkAdapterTest
> -------------------------------------
>
> Key: CALCITE-6203
> URL: https://issues.apache.org/jira/browse/CALCITE-6203
> Project: Calcite
> Issue Type: Test
> Components: spark
> Affects Versions: 1.36.0
> Reporter: Caican Cai
> Priority: Minor
> Fix For: 1.37.0
>
>
> I test the avg operator in SparkAdapterTest
> sql test:
> {color:#cc7832}final {color}String sql = {color:#6a8759}"select sum(x) as
> SUM_X, min(y) as MIN_Y, max(y) as MAX_Y, avg(x) as
> AVG_X,"{color}{color:#6a8759} {color}+ {color:#6a8759}"count(*) as CNT_Y,
> count(distinct y) as
> CNT_DIST_Y{color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
> {color}+ {color:#6a8759}"from " {color}+ {color:#9876aa}VALUES2 {color}+
> {color:#6a8759}"{color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
> {color}+ {color:#6a8759}"group by x"{color}{color:#cc7832};{color}
>
> {color:#cc7832}result:{color}
> {color:#cc7832}final {color}String plan =
> {color:#6a8759}"PLAN="{color}{color:#6a8759} {color}+
> {color:#6a8759}"EnumerableCalc(expr#0..5=[\\{inputs}],
> expr#6=[CAST($t1):INTEGER NOT NULL], expr#7=[CAST($t2):CHAR(1) NOT NULL],
> expr#8=[CAST($t3):CHAR(1) NOT NULL], expr#9=[CAST($t4):BIGINT NOT NULL],
> SUM_X=[$t6], MIN_Y=[$t7], MAX_Y=[$t8], AVG_X=[$t0], CNT_Y=[$t9],
> CNT_DIST_Y=[$t5]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
> {color}+ {color:#6a8759}" EnumerableAggregate(group=[\\{0}], SUM_X=[MIN($2)
> FILTER $7], MIN_Y=[MIN($3) FILTER $7], MAX_Y=[MIN($4) FILTER $7],
> CNT_Y=[MIN($5) FILTER $7], CNT_DIST_Y=[COUNT($1) FILTER
> $6]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
> {color}+ {color:#6a8759}" EnumerableCalc(expr#0..6=[\\{inputs}], expr#7=[0],
> expr#8=[=($t6, $t7)], expr#9=[1], expr#10=[=($t6, $t9)],
> proj#0..5=[\\{exprs}], $g_0=[$t8],
> $g_1=[$t10]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
> {color}+ {color:#6a8759}" EnumerableAggregate(group=[\\{0, 1}],
> groups=[[\\{0, 1}, \\{0}]], SUM_X=[$SUM0($0)], MIN_Y=[MIN($1)],
> MAX_Y=[MAX($1)], CNT_Y=[COUNT()], $g=[GROUPING($0,
> $1)]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#6a8759}
> {color}+ {color:#6a8759}" EnumerableValues(tuples=[[\\{ 1, 'a' }, \\{ 2, 'b'
> }, \\{ 1, 'b' }, \\{ 2, 'c' }, \\{ 2, 'c'
> }]]){color}{color:#cc7832}\n{color}{color:#6a8759}"{color}{color:#cc7832};{color}
>
> It can be found that there is no corresponding avg operator in
> EnumerableAggregate and EnumerableCalc. Why is this? Spark sql will split the
> avg operator into sum/count form. I don’t know if this is a bug.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)