jinxing64 opened a new pull request #1434: [CALCITE-2970] Performance issue when enabling merge join. URL: https://github.com/apache/calcite/pull/1434 Currently `AbstractConvert` is disabled for EnumerableConvention(and other conventions). Thus no chance to apply a sort-merge-join (SMJ) when the inputs of join are not sorted. `AbstractConvert` is a way to enable SMJ. However relying on expansion of AbstractConverter could be expensive. There might be matching explosion using VolcanoPlanner. I did a test on `JdbcTest#testJoinManyWay -- checkJoinNWay(4)` When EnumerableConvention#useAbstractConvertersForConversion is false: ``` Timing Cost: 3 seconds Rule Matched Times: org.apache.calcite.rel.rules.AggregateProjectMergeRule:380 org.apache.calcite.rel.rules.ProjectFilterTransposeRule:6 org.apache.calcite.adapter.enumerable.EnumerableProjectRule:196 org.apache.calcite.adapter.enumerable.EnumerableJoinRule:23 org.apache.calcite.adapter.enumerable.EnumerableFilterRule:16 org.apache.calcite.rel.rules.FilterProjectTransposeRule:20 org.apache.calcite.rel.rules.FilterJoinRule$FilterIntoJoinRule:12 org.apache.calcite.rel.rules.JoinCommuteRule:23 org.apache.calcite.rel.rules.ProjectMergeRule:3387 org.apache.calcite.adapter.enumerable.EnumerableAggregateRule:17 org.apache.calcite.adapter.enumerable.EnumerableMergeJoinRule:18 org.apache.calcite.rel.rules.JoinPushThroughJoinRule:45 ``` But when EnumerableConvention#useAbstractConvertersForConversion is true: ``` Timing Cost: 52 seconds Rule Matched Times: org.apache.calcite.rel.rules.AggregateProjectMergeRule:6240 org.apache.calcite.rel.rules.ProjectFilterTransposeRule:6 org.apache.calcite.adapter.enumerable.EnumerableProjectRule:3190 org.apache.calcite.adapter.enumerable.EnumerableJoinRule:739 org.apache.calcite.adapter.enumerable.EnumerableSortRule:68 org.apache.calcite.rel.rules.FilterProjectTransposeRule:20 org.apache.calcite.rel.rules.SortRemoveRule:111 org.apache.calcite.adapter.enumerable.EnumerableAggregateRule:625 org.apache.calcite.adapter.enumerable.EnumerableMergeJoinRule:738 org.apache.calcite.plan.volcano.AbstractConverter$ExpandConversionRule:70 org.apache.calcite.adapter.enumerable.EnumerableFilterRule:16 org.apache.calcite.rel.rules.FilterJoinRule$FilterIntoJoinRule:12 org.apache.calcite.rel.rules.JoinCommuteRule:763 org.apache.calcite.rel.rules.ProjectMergeRule:44769 org.apache.calcite.rel.rules.JoinPushThroughJoinRule:1827 ``` When I test `JdbcTest#testJoinManyWay -- checkJoinNWay(6)` and EnumerableConvention#useAbstractConvertersForConversion is true, it just never return. Note that in design&impl of VolcanoRuleCall#matchRecurse, a rule is triggered when the root operand or child operand get matched by a new created/registered RelNode. This PR proposes to construct the operator of EnumerableSort when creating EnumerableMergeJoin, thus no AbstractConvert is created and save extra matching effort when optimization. With this change, JdbcTest#testJoinManyWay finished in 8 seconds and we can also enable test of VolcanoPlannerTest#testMergeJoin. This PR also proposes to add a config to indicate whether merge join is enabled. From my understanding Sort-Merge-Join and Hash-Join are both implementations of Join. From the design, all types of join should be able to be supported on both of them. User can make the choice by scenario and data characteristics. What's more, if ENUMERABLE_MERGE_JOIN_RULE enabled by default, lots of plan checking for EnumerableHashJoin need to be modified, I'm hesitate to do that much change.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
