Great, that is the correct way to change it and that should be the default implementation.
On 2021/05/28 17:41:15, Vladimir Ozerov <ppoze...@gmail.com> wrote: > BTW, I tried to change the implementation to: > > 1: protected boolean isTransformationRule(VolcanoRuleCall match) { > 2: return match.getRule() instanceof TransformationRule; > 3: } > > It solved my problem - plans returned to normal. In the Apache Calcite > repo, only 4 tests in the TopDowOptTest class failed due to a minor > operator reordering. > > пт, 28 мая 2021 г. в 20:37, Vladimir Ozerov <ppoze...@gmail.com>: > > > Hi Jinpeng, > > > > Thank you for the clarification. When I saw the code in question for the > > first time, my first thought was that it was perhaps designed for gradual > > migration. The main problem is that the current implementation discards > > parts of the plan *silently*, which might be difficult to spot. I > > only spotted the problem in my specific case because I had ~100 tests with > > complex queries. Otherwise, I would happily proceed with the new rule > > without knowing that I lost important parts of the search space. > > > > That said, I think we can do the following: > > > > 1. Emit a warning if or even throw an exception if the transformation > > rule produced a physical node. This should be trivial to implement - add > > an > > expected node type to VolcanoRuleCall (e.g., "logical", "physical", > > "any"). > > The warning/exception should contain a proper fix suggestion - to > > override > > the VolcanoPlanner.isTransformationRule. > > 2. Alternatively - do a breaking change. Apache Calcite doesn't have a > > major release cadence. It is normal practice in many products to do > > breaking changes in minor releases. Even popular products like Mongo or > > DataStax do it regularly. We may inform the user in the first release and > > change to "rule instanceof TransformationRule" in the next release. > > > > Does it make sense? > > > > Regards, > > Vladimir. > > > > пт, 28 мая 2021 г. в 19:33, Jinpeng Wu <wjpabc...@gmail.com>: > > > >> Hi, Vladimir. Good catch! There could be some improvements here. > >> > >> Actually, this problem was discovered early when the top-down rule driver > >> was designed. At that time, no rule was annotated as "TransformationRule". > >> Moreover, it is impossible to ask every calcite user who designed their > >> own > >> rules to annotate the existing code. So the top-down rule driver was > >> designed so that it can: > >> 1. Work in chaos: even if there are no hints for rule types, it can still > >> work. Some opportunities may be lost, but NO failures, NO exceptions, and > >> NO worse than the original driver. People can migrate to the new driver > >> without concern. > >> 2. Be Improvable: Users can refactor their code, if they want, step by > >> step. As rule types become more and more accurate, the system achieves > >> more > >> and more benefits > >> 3. Be easy to customize: the default implementation is designed for the > >> most common cases, so that most users can benefit from it without much > >> effort. But it is not possible to fulfill all requirements as different > >> systems could have very different patterns to define logical and > >> physical. That's why the isTransformationRule method is put in > >> VolcanoPlanner and marked as protected: overwriting it can be very simple. > >> > >> Moreover, losing some "derive" opportunities is not as serious as > >> imagination. As I mentioned in previous discussions, parents are in charge > >> of raising as many requirements as possible. During "derive", if specific > >> traits were not built by children, it means that no parents were requiring > >> that. And if parents finally require that traits in the latter > >> optimization, passThrough methods get called and new physical nodes are > >> generated and "derive" get called again. > >> I tested it on millions of queries, with or without correct rule types, in > >> my own product. The performance of group pruning varies a lot. But the > >> output plans are almost the same. Only one obvious exception was > >> discovered: the spool node. That's because spool nodes cannot "passThough" > >> parent traits (it could have multiple parents and current framework cannot > >> handle such a situation) while it can "derive" input traits. > >> > >> Of course, this conclusion may not apply to your product as we could have > >> quite different rule sets. I am just sharing some of my experiences. Maybe > >> the current implementation of "isTransformationRule" is not good enough. > >> If > >> you have any better solutions, please share them. > >> > >> Thanks, > >> Jinpeng Wu > >> > >> On Fri, May 28, 2021 at 7:10 PM Vladimir Ozerov <ppoze...@gmail.com> > >> wrote: > >> > >> > Hi, > >> > > >> > I have an optimizer that uses top-down VolcanoPlanner and has a > >> > ConverterRule for every LogicalNode. I have a new requirement when one > >> of > >> > the physical rules must emit several physical nodes instead of one. I > >> tried > >> > to convert a ConverterRule to a normal rule that emits physical nodes. > >> Even > >> > though the new rule has exactly the same pattern and logic as the > >> previous > >> > ConverterRule, plans changed. Analysis showed that this likely due to a > >> bug > >> > in the top-down optimizer as explained below. > >> > > >> > When optimizing a logical node, the top-down first schedules the > >> > transformation rules, and then implementation rules. The logic to check > >> > whether the rule is transformation rule or not is located in > >> > VolcanoPlanner.isTransformationRule [1]. The rule scheduling logic > >> ensures > >> > that a given rule is executed either as a transformation rule, or an > >> > implementation rule, but not both. See TopDowRuleQueue.popMatch. The > >> > top-down optimizer schedules tasks in a stack. So even though the > >> > transformation rules are scheduled before implementation rules, the > >> latter > >> > executed first. > >> > > >> > If an implementation rule produces a physical node, this node will be > >> > notified about input traits in the "derive" phase. In contrast, > >> > transformation rules produce logical nodes only, and this happens after > >> the > >> > derivation of the inputs is completed. Therefore, if the top-down > >> optimizer > >> > mistakenly treats an implementation rule as a transformation rule, > >> "derive" > >> > will not be called on the produced physical nodes, leading to incomplete > >> > search space exploration. > >> > > >> > It seems, that this is exactly what happens in the current > >> implementation. > >> > The VolcanoPlanner.isTransformationRule looks like this: > >> > > >> > 1: protected boolean isTransformationRule(VolcanoRuleCall match) { > >> > 2: if (match.getRule() instanceof SubstitutionRule) { > >> > 3: return true; > >> > 4: } > >> > 5: if (match.getRule() instanceof ConverterRule > >> > 6: && match.getRule().getOutTrait() == rootConvention) { > >> > 7: return false; > >> > 8: } > >> > 9: return match.getRule().getOperand().trait == Convention.NONE > >> > 10: || match.getRule().getOperand().trait == null; > >> > 11: } > >> > > >> > If the rule is a ConverterRule and it produces the node with the target > >> > convention, it is treated as an implementation rule (lines 5-6). But if > >> the > >> > rule is not a ConverterRule, the method tries to deduce the rule's type > >> > from the incoming convention (lines 9-10). In practice, implementation > >> > rules either do not care about the incoming trait or expect the NONE > >> trait. > >> > Therefore, it seems that currently, the top-down optimizer treats many > >> > implementation rules as physical rules, and as a result, cannot notify > >> > physical nodes produced from these rules about trait derivation. > >> > > >> > This explains why in my case everything was ok when all implementation > >> > rules were ConverterRules, and why I lost some optimal plans when the > >> rule > >> > was refactored to a non-converter variant. > >> > > >> > Do you agree that this a bug? If yes, shouldn't we refactor that code to > >> > just check whether the rule is an instance of TransformationRule? Since > >> > this is a breaking change, we may add a special flag that preserves the > >> old > >> > behavior by default but allows for the new behavior to overcome the > >> > aforementioned problem. > >> > > >> > Regards, > >> > Vladimir. > >> > > >> > [1] > >> > > >> > > >> https://github.com/apache/calcite/blob/calcite-1.26.0/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L1398-L1408 > >> > > >> > > >