[GitHub] [ignite] alex-plekhanov commented on a change in pull request #9070: IGNITE-14544 Calcite engine. Support DISTINCT aggregates

GitBox Wed, 19 May 2021 13:31:02 -0700


alex-plekhanov commented on a change in pull request #9070:
URL: https://github.com/apache/ignite/pull/9070#discussion_r635559297




##########
File path: 
modules/calcite/src/test/java/org/apache/ignite/internal/processors/query/calcite/planner/AggregatePlannerTest.java
##########
@@ -209,6 +213,42 @@ public void distribution() throws Exception {
         assertEquals(IgniteDistributions.single(), 
TraitUtils.distribution(rdcAgg));
     }
 
+    /**
+     * @throws Exception If failed.
+     */
+    @Test
+    public void expandDistinctAggregates() throws Exception {
+        TestTable tbl = createAffinityTable()
+            .addIndex(RelCollations.of(ImmutableIntList.of(3, 1, 0)), 
"idx_val0")
+            .addIndex(RelCollations.of(ImmutableIntList.of(3, 2, 0)), 
"idx_val1");
+
+        IgniteSchema publicSchema = new IgniteSchema("PUBLIC");
+
+        publicSchema.addTable("TEST", tbl);
+
+        String sql = "SELECT /*+ EXPAND_DISTINCT_AGG */ SUM(DISTINCT val0), 
AVG(DISTINCT val1) FROM test GROUP BY grp0";

Review comment:
       There is a weird plan for this query:
   ```
   IgniteProject(SUM(DISTINCT VAL0)=[$3], AVG(DISTINCT VAL1)=[$1]): rowcount = 
1.0, cumulative cost = IgniteCost [rowCount=623.0, cpu=629.0, memory=34.0, 
io=0.0, network=1600.0], id = 91734
     IgniteMergeJoin(condition=[IS NOT DISTINCT FROM($2, $0)], 
joinType=[inner], variablesSet=[[]], leftCollation=[[0]], 
rightCollation=[[0]]): rowcount = 1.0, cumulative cost = IgniteCost 
[rowCount=622.0, cpu=628.0, memory=34.0, io=0.0, network=1600.0], id = 91733
       IgniteSingleSortAggregate(group=[{0}], AVG(DISTINCT VAL1)=[AVG($1)], 
collation=[[0]]): rowcount = 1.0, cumulative cost = IgniteCost [rowCount=310.0, 
cpu=310.0, memory=17.0, io=0.0, network=800.0], id = 91729
         IgniteSingleSortAggregate(group=[{0, 1}], collation=[[0, 1]]): 
rowcount = 10.0, cumulative cost = IgniteCost [rowCount=300.0, cpu=300.0, 
memory=8.0, io=0.0, network=800.0], id = 91728
           IgniteExchange(distribution=[single]): rowcount = 100.0, cumulative 
cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=0.0, io=0.0, 
network=800.0], id = 91727
             IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val1], 
projects=[[$t1, $t0]], requiredColumns=[{2, 3}]): rowcount = 100.0, cumulative 
cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, network=0.0], 
id = 230
       IgniteSingleSortAggregate(group=[{0}], SUM(DISTINCT VAL0)=[SUM($1)], 
collation=[[0]]): rowcount = 1.0, cumulative cost = IgniteCost [rowCount=310.0, 
cpu=310.0, memory=17.0, io=0.0, network=800.0], id = 91732
         IgniteSingleSortAggregate(group=[{0, 1}], collation=[[0, 1]]): 
rowcount = 10.0, cumulative cost = IgniteCost [rowCount=300.0, cpu=300.0, 
memory=8.0, io=0.0, network=800.0], id = 91731
           IgniteExchange(distribution=[single]): rowcount = 100.0, cumulative 
cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=0.0, io=0.0, 
network=800.0], id = 91730
             IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val0], 
projects=[[$t1, $t0]], requiredColumns=[{1, 3}]): rowcount = 100.0, cumulative 
cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, network=0.0], 
id = 440
   ```
   Single sort aggregate is used instead of map/reduce. Even if I disable the 
rule with single sort, the plan is still weird, it uses map phase after the 
exchange:
   ```
           IgniteReduceSortAggregate(rowType=[RecordType(JavaType(class 
java.lang.Integer) GRP0, JavaType(class java.lang.Integer) VAL0)], group=[{0, 
1}], collation=[[0, 1]]): rowcount = 10.0, cumulative cost = 
IgniteCost[rowCount=310.0, cpu=310.0, memory=8.0, io=0.0, network=800.0], id = 
91357
             IgniteMapSortAggregate(group=[{0, 1}], collation=[[0, 1]]): 
rowcount = 10.0, cumulative cost = IgniteCost [rowCount=300.0, cpu=300.0, 
memory=8.0, io=0.0, network=800.0], id = 91356
               IgniteExchange(distribution=[single]): rowcount = 100.0, 
cumulative cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=0.0, io=0.0, 
network=800.0], id = 91355
                 IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val0], 
projects=[[$t1, $t0]], requiredColumns=[{1, 3}]): rowcount = 100.0, cumulative 
cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, network=0.0], 
id = 437
   ``` 
   Perhaps there is something wrong with cost calculation for aggregates. I 
think for sort aggregates with "expand distinct" in this test the cost should 
be better than without "expand distinct" and the rule should be applied without 
the hint. But now, without the hint, the cost is much better and the rule is 
not applied automatically (perhaps without the hint it never be applied ever 
for other queries too). The plan without the hint:
   ```
   IgniteProject(SUM(DISTINCT VAL0)=[$1], AVG(DISTINCT VAL1)=[$2]): rowcount = 
10.0, cumulative cost = IgniteCost [rowCount=230.0, cpu=230.0, memory=24.0, 
io=0.0, network=80.0], id = 42935
     IgniteReduceSortAggregate(rowType=[RecordType(JavaType(class 
java.lang.Integer) GRP0, JavaType(class java.lang.Integer) SUM(DISTINCT VAL0), 
JavaType(class java.lang.Integer) AVG(DISTINCT VAL1))], group=[{0}], 
SUM(DISTINCT VAL0)=[SUM(DISTINCT $1)], AVG(DISTINCT VAL1)=[AVG(DISTINCT $2)], 
collation=[[0]]): rowcount = 10.0, cumulative cost = IgniteCost 
[rowCount=220.0, cpu=220.0, memory=24.0, io=0.0, network=80.0], id = 42934
       IgniteExchange(distribution=[single]): rowcount = 10.0, cumulative cost 
= IgniteCost [rowCount=210.0, cpu=210.0, memory=24.0, io=0.0, network=80.0], id 
= 42933
         IgniteMapSortAggregate(group=[{0}], SUM(DISTINCT VAL0)=[SUM(DISTINCT 
$1)], AVG(DISTINCT VAL1)=[AVG(DISTINCT $2)], collation=[[0]]): rowcount = 10.0, 
cumulative cost = IgniteCost [rowCount=200.0, cpu=200.0, memory=24.0, io=0.0, 
network=0.0], id = 42932
           IgniteIndexScan(table=[[PUBLIC, TEST]], index=[idx_val1], 
projects=[[$t2, $t0, $t1]], requiredColumns=[{1, 2, 3}]): rowcount = 100.0, 
cumulative cost = IgniteCost [rowCount=100.0, cpu=100.0, memory=0.0, io=0.0, 
network=0.0], id = 273
   ```
   In this plan `DISTINCT VAL0`  and `DISTINCT VAL1` will require hash maps for 
each aggregate call on both map and reduce phases, but memory consumption for 
the whole plan is only 24 bytes.

##########
File path: 
modules/calcite/src/main/java/org/apache/ignite/internal/processors/query/calcite/prepare/PlannerPhase.java
##########
@@ -134,6 +135,8 @@
                                         .predicate(Aggregate::isSimple)
                                         .anyInputs())).toRule(),
 
+                    AggregateExpandDistinctAggregatesRule.Config.JOIN.toRule(),

Review comment:
       Perhaps, it's more readable to use the already created instance 
`CoreRules.AGGREGATE_EXPAND_DISTINCT_AGGREGATES_TO_JOIN`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [ignite] alex-plekhanov commented on a change in pull request #9070: IGNITE-14544 Calcite engine. Support DISTINCT aggregates

Reply via email to