jinxing64 opened a new pull request #1434: [CALCITE-2970] Performance issue 
when enabling merge join. 
URL: https://github.com/apache/calcite/pull/1434
 
 
   Currently `AbstractConvert` is disabled for EnumerableConvention(and other 
conventions). Thus no chance to apply a sort-merge-join (SMJ)  when the inputs 
of join are not sorted. 
   
   `AbstractConvert` is a way to enable SMJ. However relying on expansion of 
AbstractConverter could be expensive. There might be matching explosion using 
VolcanoPlanner.
   I did a test on `JdbcTest#testJoinManyWay -- checkJoinNWay(4)`
   When EnumerableConvention#useAbstractConvertersForConversion is false:
   ```
   Timing Cost: 3 seconds
   Rule Matched Times:
   org.apache.calcite.rel.rules.AggregateProjectMergeRule:380
   org.apache.calcite.rel.rules.ProjectFilterTransposeRule:6
   org.apache.calcite.adapter.enumerable.EnumerableProjectRule:196
   org.apache.calcite.adapter.enumerable.EnumerableJoinRule:23
   org.apache.calcite.adapter.enumerable.EnumerableFilterRule:16
   org.apache.calcite.rel.rules.FilterProjectTransposeRule:20
   org.apache.calcite.rel.rules.FilterJoinRule$FilterIntoJoinRule:12
   org.apache.calcite.rel.rules.JoinCommuteRule:23
   org.apache.calcite.rel.rules.ProjectMergeRule:3387
   org.apache.calcite.adapter.enumerable.EnumerableAggregateRule:17
   org.apache.calcite.adapter.enumerable.EnumerableMergeJoinRule:18
   org.apache.calcite.rel.rules.JoinPushThroughJoinRule:45
   ```
   But when EnumerableConvention#useAbstractConvertersForConversion is true:
   ```
   Timing Cost: 52 seconds
   Rule Matched Times:
   org.apache.calcite.rel.rules.AggregateProjectMergeRule:6240
   org.apache.calcite.rel.rules.ProjectFilterTransposeRule:6
   org.apache.calcite.adapter.enumerable.EnumerableProjectRule:3190
   org.apache.calcite.adapter.enumerable.EnumerableJoinRule:739
   org.apache.calcite.adapter.enumerable.EnumerableSortRule:68
   org.apache.calcite.rel.rules.FilterProjectTransposeRule:20
   org.apache.calcite.rel.rules.SortRemoveRule:111
   org.apache.calcite.adapter.enumerable.EnumerableAggregateRule:625
   org.apache.calcite.adapter.enumerable.EnumerableMergeJoinRule:738
   org.apache.calcite.plan.volcano.AbstractConverter$ExpandConversionRule:70
   org.apache.calcite.adapter.enumerable.EnumerableFilterRule:16
   org.apache.calcite.rel.rules.FilterJoinRule$FilterIntoJoinRule:12
   org.apache.calcite.rel.rules.JoinCommuteRule:763
   org.apache.calcite.rel.rules.ProjectMergeRule:44769
   org.apache.calcite.rel.rules.JoinPushThroughJoinRule:1827
   ```
   When I test `JdbcTest#testJoinManyWay -- checkJoinNWay(6)` and 
EnumerableConvention#useAbstractConvertersForConversion is true, it just never 
return.
   
   Note that in design&impl of VolcanoRuleCall#matchRecurse, a rule is 
triggered when the root operand or child operand get matched by a new 
created/registered RelNode.
   
   This PR proposes to construct the operator of EnumerableSort when creating 
EnumerableMergeJoin, thus no AbstractConvert is created and save extra matching 
effort when optimization.
   
   With this change, JdbcTest#testJoinManyWay finished in 8 seconds and we can 
also enable test of 
   VolcanoPlannerTest#testMergeJoin.
   
   This PR also proposes to add a config to indicate whether merge join is 
enabled.
   From my understanding Sort-Merge-Join and Hash-Join are both implementations 
of Join. From the design, all types of join should be able to be supported on 
both of them. User can make the choice by scenario and data characteristics.
   What's more, if ENUMERABLE_MERGE_JOIN_RULE enabled by default, lots of plan 
checking for EnumerableHashJoin need to be modified, I'm hesitate to do that 
much change.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to