Hello Haisheng, thanks for bringing up this discussion. I was also part of the people who expressed concerns about CALCITE-3997 due to a backwards compatibility issue in a very specific scenario in a downstream project. This issue was satisfactory handled with a temporary workaround for the RC, and now this workaround needs to be removed for the next version (CALCITE-4032). I agree with your analysis, and I think my specific scenario should not prevent this new general approach to be fully implemented (including CalcMergerRule). Thus, I am ok with CALCITE-4032, I will implement my own rule on my side to solve my specific problem, under no circumstances this should be the responsibility of the Calcite library.
Best regards, Ruben Le lun. 1 juin 2020 à 23:31, Stamatis Zampetakis <[email protected]> a écrit : > Hello, > > Since I was among the people who raised some concerns about this during the > vote I would like to mention that it was mostly about the fact it came very > late in the release cycle. > It is better to avoid breaking changes in the RC phase since it may be > difficult for people to react or test it. > > I agree with Haisheng that Calcite code should follow the best practice so > that other projects can draw inspiration from it. Nevertheless, I would > like to mention again that Enumerable is not a convention that is used only > for demonstration purposes but it is also used in production systems so we > have to be careful about backward compatibility. Still nobody complaint so > far so it means that we are doing pretty good on this aspect :) > > The release brought some very cool features and I fully support the > direction on which we are going. I don't think there is a need to open a > new thread here since we can discuss under the respective jira(s) if > necessary. > > Best, > Stamatis > > > > > > > On Wed, May 27, 2020 at 5:07 AM Haisheng Yuan <[email protected]> wrote: > > > Hi all, > > > > During vote of 1.23.0 rc1, there are some discussions about the solution > > of CALCITE-3997. > > > > Some may think it is a matter of best practice, but in many cases it is a > > matter of right or wrong. > > > > Here is the rationale. > > > > The original query in CALCITE-3997 is > > > > SELECT t1.k1, t2.k1 FROM t1, t2 > > ON t1.k1=t2.k1 > > WHERE t1.n1 + 1 = t2.n3 > > > > After FilterIntoJoin and MergeJoinRule, planner generates an alternative: > > MergeJoin on t1.k1=t2.k1 and t1.n1 + 1 = t2.n3 > > > > The mergejoin's collation is [k1]. > > > > This mergejoin matches JoinPushExpressionsRule, which pushes expression > in > > join condition, and generates a Project in child. > > Project t1.k1, t2.k1 > > -> MergeJoin on t1.k1=t2.k1 and t1.col = t2.n3 > > -> Project k1, n1+1 as col > > -> TableScan t1 > > -> TableScan t2 > > > > The new mergejoin now has 2 equi-conditions, but the collation is the > same > > with old mergejoin, still [k1], which is wrong. > > It should be updated to [k1,col] and [k1, n3]. Hence error. > > > > The similar issue already happened before, see CALCITE-3353. And it is > > very easy to trigger same error using different query even without > > FilterIntoJoin. If we change WHERE to AND in the query, > > SELECT t1.k1, t2.k1 FROM t1, t2 > > ON t1.k1=t2.k1 > > AND t1.n1 + 1 = t2.n3 > > > > or with the following query with join condition const reductions on, we > > can still see the issue. > > SELECT t1.k1, t2.k1 FROM t1, t2 > > ON t1.k1=t2.k1 > > AND (t1.n1 + 1 = t2.n3 or t1.n1 + 1 = t2.n3) > > > > Can we use the similar approach that in CALCITE-3353 to fix the issue? > > We can, but it just hide the issue. Doing that way can generate mergejoin > > with correct traitset, but the rule is still generating an invalid plan. > > Because the MergeJoin's new input is a LogicalProject, which is generated > > by RelBuilder. > > > > In Calcite, RelNode is a very special class. Unlike operators in Cascades > > or Volcano paper, where logical/physical operator and plan expression are > > separated, when constructing logical/physical expression tree, each > > operator doesn't require anything trait from its child. But RelNode in > > Calcite not only represents operator, but also plan expression, and can > > even represent a MEMO group with required traitset, which is RelSubset. > > When constructing a RelNode XXX, we need to specify the input RelNodes, > the > > input nodes' traitsets will become node XXX's requirement on those > inputs, > > even we are not aware that we are requesting it, which is extremely > > error-prone. > > > > In above case, the new MergeJoin's left input is a LogicalProject, which > > means the physical MergeJoin is requesting its left input to be logical > > operator with NONE convention, which has {inf} large cost. That means the > > newly generated MergeJoin is never implementable. > > > > What if we have a physical RelBuilder to generate the Project in the > rule? > > The consequence is worse. > > With logical RelBuilder, it just generates an invalid plan, we are still > > safe because it won't be selected as the best plan. If we get a physical > > RelBuilder, which generates a physical plan that is implementable, but > this > > is a wrong plan. Because in RelOptUtil.pushDownJoinConditions, when > > creating the MergeJoin, it just specifies its traitset and inputs, but it > > doesn't request the correct collation on the MergeJoin's inputs. The > > Project's collation is EMPTY, that means the MergeJoin doesn't require > any > > collation on the Project. Apparently this will be a wrong plan. > > > > This kind of issue doesn't limit to JoinPushExpressionsRule, any rule > that > > has Join as operand may have the same issue. In fact, all rules that has > > Aggregate as operand may suffer the same issue. Because in 1.23.0, we > add a > > new physical operator EnumerableSortedAggregate, which requires its input > > to be sorted by group keys. AggregateProjectMergeRule can generate wrong > > plan if we apply it on EnumerableSortedAggregate. In addition, we are now > > discussing about adding a new EnumerableMergeUnion, to require the > UNION's > > input sorted. Maybe in next few releases, we will have sort based > > EnumerableWindow operator. Finally we may find that only ProjectMerge, > > FilterMerge, CalcMerge can work correctly with physical operators, > because > > these operators doesn't require any trait from input, so it just happens > to > > work. The change of TransformationRule looks risky at first glance, > > however, it makes those rules less risky. > > > > Some thinks we should provide different rule instances / types to match > > logical/physical operators. But IMO, these built-in transformation rules > > shouldn't match physical operators at all in VolcanoPlanner, because > most, > > if not all, of them can't worked correctly with physical operators. Even > > for Project/Filter/Calc merge rules, all the merge work can be done > through > > logical operators, applying on physical operators will duplicate the > > process again. > > > > Even if we just view this as a best practice, and we (as Calcite project > > members) don't follow the best practice on Calcite's built-in > > EnumerableConvention, how can we tell downstream projects to follow the > > best practices? Is there any document stating the best practices? Most of > > them just follow what Calcite does. Just like children follow the example > > of their parents. If we checkout the code of Apache Druid, Apache Beam, > we > > will find that they are doing the similar way as Calcite's Enumerable > > Convention. Beam even adds JoinCommuteRule, JoinAssociateRule, > > JoinPushThroughJoinRule.RIGHT, JoinPushThroughJoinRule.LEFT all together > in > > its ruleset. I guess join performance is not a problem for BEAM. > > > > Finally, I don't think CalcMergeRule should be an exception, nothing is > > impossible without physical CalcMergeRule. > > > > Thanks, > > Haisheng > > > > > > >
