Re: Top-down optimizer cannot explore the search space because physical rule is treated as transformation rule

Haisheng Yuan Fri, 28 May 2021 11:28:00 -0700

Great, that is the correct way to change it and that should be the default 
implementation.


On 2021/05/28 17:41:15, Vladimir Ozerov <ppoze...@gmail.com> wrote: 
> BTW, I tried to change the implementation to:
> 
>  1: protected boolean isTransformationRule(VolcanoRuleCall match) {
>  2:    return match.getRule() instanceof TransformationRule;
>  3: }
> 
> It solved my problem - plans returned to normal. In the Apache Calcite
> repo, only 4 tests in the TopDowOptTest class failed due to a minor
> operator reordering.
> 
> пт, 28 мая 2021 г. в 20:37, Vladimir Ozerov <ppoze...@gmail.com>:
> 
> > Hi Jinpeng,
> >
> > Thank you for the clarification. When I saw the code in question for the
> > first time, my first thought was that it was perhaps designed for gradual
> > migration. The main problem is that the current implementation discards
> > parts of the plan *silently*, which might be difficult to spot. I
> > only spotted the problem in my specific case because I had ~100 tests with
> > complex queries. Otherwise, I would happily proceed with the new rule
> > without knowing that I lost important parts of the search space.
> >
> > That said, I think we can do the following:
> >
> >    1. Emit a warning if or even throw an exception if the transformation
> >    rule produced a physical node. This should be trivial to implement - add 
> > an
> >    expected node type to VolcanoRuleCall (e.g., "logical", "physical", 
> > "any").
> >    The warning/exception should contain a proper fix suggestion - to 
> > override
> >    the VolcanoPlanner.isTransformationRule.
> >    2. Alternatively - do a breaking change. Apache Calcite doesn't have a
> >    major release cadence. It is normal practice in many products to do
> >    breaking changes in minor releases. Even popular products like Mongo or
> >    DataStax do it regularly. We may inform the user in the first release and
> >    change to "rule instanceof TransformationRule" in the next release.
> >
> > Does it make sense?
> >
> > Regards,
> > Vladimir.
> >
> > пт, 28 мая 2021 г. в 19:33, Jinpeng Wu <wjpabc...@gmail.com>:
> >
> >> Hi, Vladimir. Good catch! There could be some improvements here.
> >>
> >> Actually, this problem was discovered early when the top-down rule driver
> >> was designed. At that time, no rule was annotated as "TransformationRule".
> >> Moreover, it is impossible to ask every calcite user who designed their
> >> own
> >> rules to annotate the existing code. So the top-down rule driver was
> >> designed so that it can:
> >> 1. Work in chaos: even if there are no hints for rule types, it can still
> >> work. Some opportunities may be lost, but NO failures, NO exceptions, and
> >> NO worse than the original driver. People can migrate to the new driver
> >> without concern.
> >> 2. Be Improvable: Users can refactor their code, if they want, step by
> >> step. As rule types become more and more accurate, the system achieves
> >> more
> >> and more benefits
> >> 3. Be easy to customize: the default implementation is designed for the
> >> most common cases, so that most users can benefit from it without much
> >> effort. But it is not possible to fulfill all requirements as different
> >> systems could have very different patterns to define logical and
> >> physical. That's why the isTransformationRule method is put in
> >> VolcanoPlanner and marked as protected: overwriting it can be very simple.
> >>
> >> Moreover, losing some "derive" opportunities is not as serious as
> >> imagination. As I mentioned in previous discussions, parents are in charge
> >> of raising as many requirements as possible. During "derive", if specific
> >> traits were not built by children, it means that no parents were requiring
> >> that. And if parents finally require that traits in the latter
> >> optimization, passThrough methods get called and new physical nodes are
> >> generated and "derive" get called again.
> >> I tested it on millions of queries, with or without correct rule types, in
> >> my own product. The performance of group pruning varies a lot. But the
> >> output plans are almost the same. Only one obvious exception was
> >> discovered: the spool node. That's because spool nodes cannot "passThough"
> >> parent traits (it could have multiple parents and current framework cannot
> >> handle such a situation) while it can "derive" input traits.
> >>
> >> Of course, this conclusion may not apply to your product as we could have
> >> quite different rule sets. I am just sharing some of my experiences. Maybe
> >> the current implementation of "isTransformationRule" is not good enough.
> >> If
> >> you have any better solutions, please share them.
> >>
> >> Thanks,
> >> Jinpeng Wu
> >>
> >> On Fri, May 28, 2021 at 7:10 PM Vladimir Ozerov <ppoze...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I have an optimizer that uses top-down VolcanoPlanner and has a
> >> > ConverterRule for every LogicalNode. I have a new requirement when one
> >> of
> >> > the physical rules must emit several physical nodes instead of one. I
> >> tried
> >> > to convert a ConverterRule to a normal rule that emits physical nodes.
> >> Even
> >> > though the new rule has exactly the same pattern and logic as the
> >> previous
> >> > ConverterRule, plans changed. Analysis showed that this likely due to a
> >> bug
> >> > in the top-down optimizer as explained below.
> >> >
> >> > When optimizing a logical node, the top-down first schedules the
> >> > transformation rules, and then implementation rules. The logic to check
> >> > whether the rule is transformation rule or not is located in
> >> > VolcanoPlanner.isTransformationRule [1]. The rule scheduling logic
> >> ensures
> >> > that a given rule is executed either as a transformation rule, or an
> >> > implementation rule, but not both. See TopDowRuleQueue.popMatch. The
> >> > top-down optimizer schedules tasks in a stack. So even though the
> >> > transformation rules are scheduled before implementation rules, the
> >> latter
> >> > executed first.
> >> >
> >> > If an implementation rule produces a physical node, this node will be
> >> > notified about input traits in the "derive" phase. In contrast,
> >> > transformation rules produce logical nodes only, and this happens after
> >> the
> >> > derivation of the inputs is completed. Therefore, if the top-down
> >> optimizer
> >> > mistakenly treats an implementation rule as a transformation rule,
> >> "derive"
> >> > will not be called on the produced physical nodes, leading to incomplete
> >> > search space exploration.
> >> >
> >> > It seems, that this is exactly what happens in the current
> >> implementation.
> >> > The VolcanoPlanner.isTransformationRule looks like this:
> >> >
> >> >  1: protected boolean isTransformationRule(VolcanoRuleCall match) {
> >> >  2:    if (match.getRule() instanceof SubstitutionRule) {
> >> >  3:      return true;
> >> >  4:    }
> >> >  5:    if (match.getRule() instanceof ConverterRule
> >> >  6:        && match.getRule().getOutTrait() == rootConvention) {
> >> >  7:      return false;
> >> >  8:    }
> >> >  9:    return match.getRule().getOperand().trait == Convention.NONE
> >> > 10:        || match.getRule().getOperand().trait == null;
> >> > 11: }
> >> >
> >> > If the rule is a ConverterRule and it produces the node with the target
> >> > convention, it is treated as an implementation rule (lines 5-6). But if
> >> the
> >> > rule is not a ConverterRule, the method tries to deduce the rule's type
> >> > from the incoming convention (lines 9-10). In practice, implementation
> >> > rules either do not care about the incoming trait or expect the NONE
> >> trait.
> >> > Therefore, it seems that currently, the top-down optimizer treats many
> >> > implementation rules as physical rules, and as a result, cannot notify
> >> > physical nodes produced from these rules about trait derivation.
> >> >
> >> > This explains why in my case everything was ok when all implementation
> >> > rules were ConverterRules, and why I lost some optimal plans when the
> >> rule
> >> > was refactored to a non-converter variant.
> >> >
> >> > Do you agree that this a bug? If yes, shouldn't we refactor that code to
> >> > just check whether the rule is an instance of TransformationRule? Since
> >> > this is a breaking change, we may add a special flag that preserves the
> >> old
> >> > behavior by default but allows for the new behavior to overcome the
> >> > aforementioned problem.
> >> >
> >> > Regards,
> >> > Vladimir.
> >> >
> >> > [1]
> >> >
> >> >
> >> https://github.com/apache/calcite/blob/calcite-1.26.0/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L1398-L1408
> >> >
> >>
> >
>

Re: Top-down optimizer cannot explore the search space because physical rule is treated as transformation rule

Reply via email to