Hi. Warnings and Fixing isTransformationRule are good solutions. But I am still concerned about reporting an error. It will force users to do a large refactoring on existing codes before they can migrate to the new rule driver. Any refactoring can be risky, especially for those critical services. It leaves those systems no choice but to keep using old versions of calcite. However, we usually can still get a good plan even without correct rule types. I don't think it worthwhile to introduce that change.
Thanks Jinpeng Wu On Sun, May 30, 2021 at 5:19 AM Vladimir Ozerov <[email protected]> wrote: > Great. Does the community have any objections to the following fix? > 1. Add a flag to a rule call instance with the expected node type. In the > case of a mismatch, we will print a warning once per rule per JVM instance > to avoid too many messages. Alternatively, we may print a warning per rule > per VolcanoPlanner, but I am concerned with too many repetitive messages > because VolcanoPlanner is usually instantiated per SQL query. > 2. In the next release, (1) replace the warning with an error, (2) change > VolcanoPlanner.isTransformationRule as discussed above. > > > > Пт, 28 мая 2021 г. в 21:27, Haisheng Yuan <[email protected]>: > > > Great, that is the correct way to change it and that should be the > default > > implementation. > > > > On 2021/05/28 17:41:15, Vladimir Ozerov <[email protected]> wrote: > > > BTW, I tried to change the implementation to: > > > > > > 1: protected boolean isTransformationRule(VolcanoRuleCall match) { > > > 2: return match.getRule() instanceof TransformationRule; > > > 3: } > > > > > > It solved my problem - plans returned to normal. In the Apache Calcite > > > repo, only 4 tests in the TopDowOptTest class failed due to a minor > > > operator reordering. > > > > > > пт, 28 мая 2021 г. в 20:37, Vladimir Ozerov <[email protected]>: > > > > > > > Hi Jinpeng, > > > > > > > > Thank you for the clarification. When I saw the code in question for > > the > > > > first time, my first thought was that it was perhaps designed for > > gradual > > > > migration. The main problem is that the current implementation > discards > > > > parts of the plan *silently*, which might be difficult to spot. I > > > > only spotted the problem in my specific case because I had ~100 tests > > with > > > > complex queries. Otherwise, I would happily proceed with the new rule > > > > without knowing that I lost important parts of the search space. > > > > > > > > That said, I think we can do the following: > > > > > > > > 1. Emit a warning if or even throw an exception if the > > transformation > > > > rule produced a physical node. This should be trivial to implement > > - add an > > > > expected node type to VolcanoRuleCall (e.g., "logical", > "physical", > > "any"). > > > > The warning/exception should contain a proper fix suggestion - to > > override > > > > the VolcanoPlanner.isTransformationRule. > > > > 2. Alternatively - do a breaking change. Apache Calcite doesn't > > have a > > > > major release cadence. It is normal practice in many products to > do > > > > breaking changes in minor releases. Even popular products like > > Mongo or > > > > DataStax do it regularly. We may inform the user in the first > > release and > > > > change to "rule instanceof TransformationRule" in the next > release. > > > > > > > > Does it make sense? > > > > > > > > Regards, > > > > Vladimir. > > > > > > > > пт, 28 мая 2021 г. в 19:33, Jinpeng Wu <[email protected]>: > > > > > > > >> Hi, Vladimir. Good catch! There could be some improvements here. > > > >> > > > >> Actually, this problem was discovered early when the top-down rule > > driver > > > >> was designed. At that time, no rule was annotated as > > "TransformationRule". > > > >> Moreover, it is impossible to ask every calcite user who designed > > their > > > >> own > > > >> rules to annotate the existing code. So the top-down rule driver was > > > >> designed so that it can: > > > >> 1. Work in chaos: even if there are no hints for rule types, it can > > still > > > >> work. Some opportunities may be lost, but NO failures, NO > exceptions, > > and > > > >> NO worse than the original driver. People can migrate to the new > > driver > > > >> without concern. > > > >> 2. Be Improvable: Users can refactor their code, if they want, step > by > > > >> step. As rule types become more and more accurate, the system > achieves > > > >> more > > > >> and more benefits > > > >> 3. Be easy to customize: the default implementation is designed for > > the > > > >> most common cases, so that most users can benefit from it without > much > > > >> effort. But it is not possible to fulfill all requirements as > > different > > > >> systems could have very different patterns to define logical and > > > >> physical. That's why the isTransformationRule method is put in > > > >> VolcanoPlanner and marked as protected: overwriting it can be very > > simple. > > > >> > > > >> Moreover, losing some "derive" opportunities is not as serious as > > > >> imagination. As I mentioned in previous discussions, parents are in > > charge > > > >> of raising as many requirements as possible. During "derive", if > > specific > > > >> traits were not built by children, it means that no parents were > > requiring > > > >> that. And if parents finally require that traits in the latter > > > >> optimization, passThrough methods get called and new physical nodes > > are > > > >> generated and "derive" get called again. > > > >> I tested it on millions of queries, with or without correct rule > > types, in > > > >> my own product. The performance of group pruning varies a lot. But > the > > > >> output plans are almost the same. Only one obvious exception was > > > >> discovered: the spool node. That's because spool nodes cannot > > "passThough" > > > >> parent traits (it could have multiple parents and current framework > > cannot > > > >> handle such a situation) while it can "derive" input traits. > > > >> > > > >> Of course, this conclusion may not apply to your product as we could > > have > > > >> quite different rule sets. I am just sharing some of my experiences. > > Maybe > > > >> the current implementation of "isTransformationRule" is not good > > enough. > > > >> If > > > >> you have any better solutions, please share them. > > > >> > > > >> Thanks, > > > >> Jinpeng Wu > > > >> > > > >> On Fri, May 28, 2021 at 7:10 PM Vladimir Ozerov <[email protected] > > > > > >> wrote: > > > >> > > > >> > Hi, > > > >> > > > > >> > I have an optimizer that uses top-down VolcanoPlanner and has a > > > >> > ConverterRule for every LogicalNode. I have a new requirement when > > one > > > >> of > > > >> > the physical rules must emit several physical nodes instead of > one. > > I > > > >> tried > > > >> > to convert a ConverterRule to a normal rule that emits physical > > nodes. > > > >> Even > > > >> > though the new rule has exactly the same pattern and logic as the > > > >> previous > > > >> > ConverterRule, plans changed. Analysis showed that this likely due > > to a > > > >> bug > > > >> > in the top-down optimizer as explained below. > > > >> > > > > >> > When optimizing a logical node, the top-down first schedules the > > > >> > transformation rules, and then implementation rules. The logic to > > check > > > >> > whether the rule is transformation rule or not is located in > > > >> > VolcanoPlanner.isTransformationRule [1]. The rule scheduling logic > > > >> ensures > > > >> > that a given rule is executed either as a transformation rule, or > an > > > >> > implementation rule, but not both. See TopDowRuleQueue.popMatch. > The > > > >> > top-down optimizer schedules tasks in a stack. So even though the > > > >> > transformation rules are scheduled before implementation rules, > the > > > >> latter > > > >> > executed first. > > > >> > > > > >> > If an implementation rule produces a physical node, this node will > > be > > > >> > notified about input traits in the "derive" phase. In contrast, > > > >> > transformation rules produce logical nodes only, and this happens > > after > > > >> the > > > >> > derivation of the inputs is completed. Therefore, if the top-down > > > >> optimizer > > > >> > mistakenly treats an implementation rule as a transformation rule, > > > >> "derive" > > > >> > will not be called on the produced physical nodes, leading to > > incomplete > > > >> > search space exploration. > > > >> > > > > >> > It seems, that this is exactly what happens in the current > > > >> implementation. > > > >> > The VolcanoPlanner.isTransformationRule looks like this: > > > >> > > > > >> > 1: protected boolean isTransformationRule(VolcanoRuleCall match) > { > > > >> > 2: if (match.getRule() instanceof SubstitutionRule) { > > > >> > 3: return true; > > > >> > 4: } > > > >> > 5: if (match.getRule() instanceof ConverterRule > > > >> > 6: && match.getRule().getOutTrait() == rootConvention) { > > > >> > 7: return false; > > > >> > 8: } > > > >> > 9: return match.getRule().getOperand().trait == > Convention.NONE > > > >> > 10: || match.getRule().getOperand().trait == null; > > > >> > 11: } > > > >> > > > > >> > If the rule is a ConverterRule and it produces the node with the > > target > > > >> > convention, it is treated as an implementation rule (lines 5-6). > > But if > > > >> the > > > >> > rule is not a ConverterRule, the method tries to deduce the rule's > > type > > > >> > from the incoming convention (lines 9-10). In practice, > > implementation > > > >> > rules either do not care about the incoming trait or expect the > NONE > > > >> trait. > > > >> > Therefore, it seems that currently, the top-down optimizer treats > > many > > > >> > implementation rules as physical rules, and as a result, cannot > > notify > > > >> > physical nodes produced from these rules about trait derivation. > > > >> > > > > >> > This explains why in my case everything was ok when all > > implementation > > > >> > rules were ConverterRules, and why I lost some optimal plans when > > the > > > >> rule > > > >> > was refactored to a non-converter variant. > > > >> > > > > >> > Do you agree that this a bug? If yes, shouldn't we refactor that > > code to > > > >> > just check whether the rule is an instance of TransformationRule? > > Since > > > >> > this is a breaking change, we may add a special flag that > preserves > > the > > > >> old > > > >> > behavior by default but allows for the new behavior to overcome > the > > > >> > aforementioned problem. > > > >> > > > > >> > Regards, > > > >> > Vladimir. > > > >> > > > > >> > [1] > > > >> > > > > >> > > > > >> > > > https://github.com/apache/calcite/blob/calcite-1.26.0/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L1398-L1408 > > > >> > > > > >> > > > > > > > > > >
