alex-plekhanov commented on a change in pull request #9070:
URL: https://github.com/apache/ignite/pull/9070#discussion_r635559297
##########
File path:
modules/calcite/src/test/java/org/apache/ignite/internal/processors/query/calcite/planner/AggregatePlannerTest.java
##########
@@ -209,6 +213,42 @@ public void distribution() throws Exception {
assertEquals(IgniteDistributions.single(),
TraitUtils.distribution(rdcAgg));
}
+ /**
+ * @throws Exception If failed.
+ */
+ @Test
+ public void expandDistinctAggregates() throws Exception {
+ TestTable tbl = createAffinityTable()
+ .addIndex(RelCollations.of(ImmutableIntList.of(3, 1, 0)),
"idx_val0")
+ .addIndex(RelCollations.of(ImmutableIntList.of(3, 2, 0)),
"idx_val1");
+
+ IgniteSchema publicSchema = new IgniteSchema("PUBLIC");
+
+ publicSchema.addTable("TEST", tbl);
+
+ String sql = "SELECT /*+ EXPAND_DISTINCT_AGG */ SUM(DISTINCT val0),
AVG(DISTINCT val1) FROM test GROUP BY grp0";
Review comment:
There is a weird plan for this query:
```
IgniteProject(SUM(DISTINCT VAL0)=[$3], AVG(DISTINCT VAL1)=[$1]): rowcount =
1.0, cumulative cost = IgniteCost [rowCount=623.0, cpu=629.0, memory=34.0,
io=0.0, network=1600.0], id = 91734
IgniteMergeJoin(condition=[IS NOT DISTINCT FROM($2, $0)],
joinType=[inner], variablesSet=[[]], leftCollation=[[0]],
rightCollation=[[0]]): rowcount = 1.0, cumulative cost = IgniteCost
[rowCount=622.0, cpu=628.0, memory=34.0, io=0.0, network=1600.0], id = 91733
IgniteSingleSortAggregate(group=[{0}], AVG(DISTINCT VAL1)=[AVG($1)],
collation=[[0]]): rowcount = 1.0, cumulative cost = IgniteCost [rowCount=310.0,
cpu=310.0, memory=17.0, io=0.0, network=800.0], id = 91729
IgniteSingleSortAggregate(group=[{0, 1}], collation=[[0, 1]]):
rowcount = 10.0, cumulative cost = IgniteCost [rowCount=300.0, cpu=300.0,
memory=8.0, io=0.0, network=800.0], id = 91728
IgniteExchange(distribution=[single]): rowcount = 100.0, cumulative
cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=0.0, io=0.0,
network=800.0], id = 91727
IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val1],
projects=[[$t1, $t0]], requiredColumns=[{2, 3}]): rowcount = 100.0, cumulative
cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, network=0.0],
id = 230
IgniteSingleSortAggregate(group=[{0}], SUM(DISTINCT VAL0)=[SUM($1)],
collation=[[0]]): rowcount = 1.0, cumulative cost = IgniteCost [rowCount=310.0,
cpu=310.0, memory=17.0, io=0.0, network=800.0], id = 91732
IgniteSingleSortAggregate(group=[{0, 1}], collation=[[0, 1]]):
rowcount = 10.0, cumulative cost = IgniteCost [rowCount=300.0, cpu=300.0,
memory=8.0, io=0.0, network=800.0], id = 91731
IgniteExchange(distribution=[single]): rowcount = 100.0, cumulative
cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=0.0, io=0.0,
network=800.0], id = 91730
IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val0],
projects=[[$t1, $t0]], requiredColumns=[{1, 3}]): rowcount = 100.0, cumulative
cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, network=0.0],
id = 440
```
Single sort aggregate is used instead of map/reduce. Even if I disable the
rule with single sort, the plan is still weird, it uses map phase after the
exchange:
```
IgniteReduceSortAggregate(rowType=[RecordType(JavaType(class
java.lang.Integer) GRP0, JavaType(class java.lang.Integer) VAL0)], group=[{0,
1}], collation=[[0, 1]]): rowcount = 10.0, cumulative cost =
IgniteCost[rowCount=310.0, cpu=310.0, memory=8.0, io=0.0, network=800.0], id =
91357
IgniteMapSortAggregate(group=[{0, 1}], collation=[[0, 1]]):
rowcount = 10.0, cumulative cost = IgniteCost [rowCount=300.0, cpu=300.0,
memory=8.0, io=0.0, network=800.0], id = 91356
IgniteExchange(distribution=[single]): rowcount = 100.0,
cumulative cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=0.0, io=0.0,
network=800.0], id = 91355
IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val0],
projects=[[$t1, $t0]], requiredColumns=[{1, 3}]): rowcount = 100.0, cumulative
cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, network=0.0],
id = 437
```
Perhaps there is something wrong with cost calculation for aggregates. I
think for sort aggregates with "expand distinct" in this test the cost should
be better than without "expand distinct" and the rule should be applied without
the hint. But now, without the hint, the cost is much better and the rule is
not applied automatically (perhaps without the hint it never be applied ever
for other queries too). The plan without the hint:
```
IgniteProject(SUM(DISTINCT VAL0)=[$1], AVG(DISTINCT VAL1)=[$2]): rowcount =
10.0, cumulative cost = IgniteCost [rowCount=230.0, cpu=230.0, memory=24.0,
io=0.0, network=80.0], id = 42935
IgniteReduceSortAggregate(rowType=[RecordType(JavaType(class
java.lang.Integer) GRP0, JavaType(class java.lang.Integer) SUM(DISTINCT VAL0),
JavaType(class java.lang.Integer) AVG(DISTINCT VAL1))], group=[{0}],
SUM(DISTINCT VAL0)=[SUM(DISTINCT $1)], AVG(DISTINCT VAL1)=[AVG(DISTINCT $2)],
collation=[[0]]): rowcount = 10.0, cumulative cost = IgniteCost
[rowCount=220.0, cpu=220.0, memory=24.0, io=0.0, network=80.0], id = 42934
IgniteExchange(distribution=[single]): rowcount = 10.0, cumulative cost
= IgniteCost [rowCount=210.0, cpu=210.0, memory=24.0, io=0.0, network=80.0], id
= 42933
IgniteMapSortAggregate(group=[{0}], SUM(DISTINCT VAL0)=[SUM(DISTINCT
$1)], AVG(DISTINCT VAL1)=[AVG(DISTINCT $2)], collation=[[0]]): rowcount = 10.0,
cumulative cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=24.0, io=0.0,
network=0.0], id = 42932
IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val1],
projects=[[$t2, $t0, $t1]], requiredColumns=[{1, 2, 3}]): rowcount = 100.0,
cumulative cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0,
network=0.0], id = 273
```
In this plan `DISTINCT VAL0` and `DISTINCT VAL1` will require hash maps for
each aggregate call on both map and reduce phases, but memory consumption for
the whole plan is only 24 bytes.
##########
File path:
modules/calcite/src/main/java/org/apache/ignite/internal/processors/query/calcite/prepare/PlannerPhase.java
##########
@@ -134,6 +135,8 @@
.predicate(Aggregate::isSimple)
.anyInputs())).toRule(),
+ AggregateExpandDistinctAggregatesRule.Config.JOIN.toRule(),
Review comment:
Perhaps, it's more readable to use the already created instance
`CoreRules.AGGREGATE_EXPAND_DISTINCT_AGGREGATES_TO_JOIN`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]