Re: Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

Vladimir Ozerov Thu, 19 Mar 2020 12:38:10 -0700

Hi Haisheng,

Thank you for pointing to 1.23.0 changes, I'll try to use this version.


Regards,
Vladimir.

вс, 15 мар. 2020 г. в 08:05, Haisheng Yuan <[email protected]>:

> Thanks for your detailed explanation, Vladimir.
>
> Indeed, the only way to propagate traits in Calcite currently is using
> rules, which is a big pain. I can feel your pain. I tried to come up ways
> to implement the trait derivation and requirement in the framwork without
> breaking current usages, only turns out it is almost impossible. It has too
> many stakeholders, even a small change may incur opposition.
>
> But before we get the real top-down cascades framework, there are still a
> lot you can do to improve your planner's performance.
>
> Since Calcite 1.22.0, I committed a change that enabes RelSubset to be
> used to trigger a rule, which can greatly reduce the number of rule calls
> for trait propagation. With your example, you need 2 rules:
> 1. Physical implementation rule
>   match pattern: operand(LogicalProject.class)
>   Produce PhysicalProject without trait
> 2. Project trait propagtion rule
>   match pattern: operand(PhysicalProject.class, operand(RelSubset.class))
>   Produce PhysicalProject with derived trait.
>
> Since 1.23.0, we removed the rule match importances and ordering, I guess
> the can reduce the optimizatino time around 10~20% for some complex queries
> with many rule calls.
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人：Vladimir Ozerov<[email protected]>
> 日 期：2020年03月15日 04:18:53
> 收件人：Haisheng Yuan<[email protected]>
> 抄 送：[email protected] ([email protected])<[email protected]
> >
> 主 题：Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching
> specific rels
>
> Hi Haisheng,
>
> You are right, the behavior I showed is not the default one, I should
> provide more context here. This is how we (Hazelcast) and at least Drill
> use the engine to ensure that the produced plan is optimal, I gave an
> example in [1].
> In real distributed engines, we rely on physical properties heavily. Most
> notably, distribution and collation. And the fundamental problem with the
> VolcanoOptimizer, is that it cannot propagate traits in a controlled
> manner. This, in turn, forces us to use AbstractConverters and implement
> rules in ways, which appear strange to Calcite community :-). And this, in
> turn, leads to excessive rule calls and failure to plan complex queries.
>
> Let's consider the same tree again, but now assuming that this is not the
> complete tree, but a subtree, and there are some parent operators. Let's
> also assume that the ScanRule may produce two equivalent operators with
> different physical properties: PhysicalScan and PhysicalIndexScan[a ASC].
> It is important to consider both alternatives in parent operators. Now
> let's consider two different ways to optimize that subtree.
>
> 1. Canonical Calcite way (default)
> 1.1 Perform initial rules ordering, parent rules fire first: [ProjectRule,
> ScanRule]
> 1.2 Invoke ProjectRule, which produces physical project without any
> physical traits
> 1.3 Invoke ScanRule, which produces, PhysicalScan and PhysicalIndexScan[a
> ASC]
> Since ProjectRule was not aware of different scan alternatives, it missed
> collation, and resulting hypergraph looks like this:
>
> -- PhysicalProject
> ---- [PhysicalScan, PhysicalIndexScan[a ASC]]
>
> This plan is suboptimal, because of parent operators cannot take advantage
> of collation.
>
> 2. Hazelast/Drill way:
> 2.1 Enable abstract converters
> 2.2 Rules are ordered in the same way as in example 1: [ProjectRule,
> ScanRule]
> 2.3 ProjectRule fires, enumerates physical implementations of the input.
> Since there are no physical inputs yet, it exits without any
> transformations
> 2.4 ScanRule fires and produces two physical scans
> 2.5 Abstract converters ensure that the ProjectRule is scheduled for
> execution again because it's input RelSet has new nodes
> 2.6 ProjectRule fires again, now having physical inputs, and generates
> one implementation per unique combination of physical input traits.
>
> As a result, we have two graphs now:
>
> Graph 1:
> -- PhysicalProject
> ---- PhysicalScan
>
> Graph 2:
> -- PhysicalProject[a ASC]
> ---- PhysicalIndexScan[a ASC]
>
> Notice how we propagated physical collation. Now parent operators may take
> advantage of it, e.g. eliminate sorting, or perform streaming aggregation
> instead of blocking hash aggregation.
>
> This is the fundamental problem we have in Hazelcast: how to ensure the
> complete exploration of the search space without excessive rule calls.
>
> Very short summary:
> 1. The default behavior of VolcanoOptimizer cannot explore the physical
> search space, so plans are not optimal
> 2. Abstract converters fix this if you follow a certain pattern in rule
> implementations (see 2.3), but generate too many rule calls, so join
> planning rules cannot be called together with other rules, which again lead
> to not optimal plans (yet, better than with p.1)
> 3. "Trait pull-up" proposal may fix it. But I have a feeling that pulling
> up possible trait combinations from a child node is indistinguishable from
> child node exploration, so it may be not very efficient again
> 4. A brand new optimizer implementation with recursive top-down approach
> may address all the concerns from p.1-p.3, but appears to be complex to
> implement and may be incompatible with existing rules
>
> Not an easy choice.
>
> Regards,
> Vladimir.
>
> [1]
> https://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3CCAJJmzpU9_75O48WeEKgHKg3fTMhK3eAMmHNOVvczj_uUTBxHkA%40mail.gmail.com%3E
>
> сб, 14 мар. 2020 г. в 21:53, Haisheng Yuan <[email protected]>:
>
>> I agree that there should be no global rule queue, we should it do it on
>> per-operator basis, which is also how other major Cascades frameworks do.
>>
>> However, Calcite's VolcanoPlanner doesn't generate unnecessary rule calls
>> as you described. The current process is:
>> 1. global rule queue: ScanRule, ProjectRule
>> 2. Call ScanRule, produce physical scan
>> 3. Call ProjectRule, produce physical project.
>>
>> Even with global rule queue of reversed order ProjectRule, ScanRule,
>> there are still 2 rule calls. In your step 2, ProjectRule doesn't produce
>> physical node, which is incorrect. Any rule is, and should be independent
>> with each other rule. If your rule relies on other operators or rules to be
>> explored first, then you should think about it twice.
>>
>> - Haisheng
>>
>> ------------------------------------------------------------------
>> 发件人：Vladimir Ozerov<[email protected]>
>> 日 期：2020年03月15日 01:50:10
>> 收件人：[email protected] ([email protected])<
>> [email protected]>
>> 主 题：Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching
>> specific rels
>>
>> Hi Roman,
>>
>> In my understanding, the proposed minor changes may only decrease the
>> total
>> number of rule invocations slightly, but all principal problems remain the
>> same. In the top-down approach, you do not want to implement bottom
>> logical
>> nodes unless it is requested explicitly by a parent operator.
>>
>> It seems to me that the very first step to efficient optimizer could be a
>> new rule invocation infrastructure. There should be *no global rule queue*
>> at all. Instead, we may introduce the per-node rule queue. Then, the
>> optimizer performs a recursive top-down optimization dive, invoking the
>> rules for every operator. Consider the following simple tree:
>> -- LogicalProject
>> ---- LogicalScan
>>
>> Assuming that we have two implementation rules ProjectRule, ScanRule, and
>> abstract converters enabled, VolcanoOptimizer will proceed as follows,
>> generating one unnecessary rule call:
>> 1. Define global rule call queue: ProjectRule, ScanRule
>> 2. Call ProjectRule, no new nodes are produced
>> 3. Call ScanRule, produce physical scans, reschedule ProjectRule
>> 4. Call ProjectRule again, produce the physical project
>>
>> With the proposed approach, it will work differently:
>> 1. Define per-operator queues:
>> LogicalProject -> ProjectRule
>> LogicalScan -> ScanRule
>> 2. Call optimize(LogicalProject)
>> 3. Invoke ProjectRule, which calls optimize(LogicalScan) on the input
>> 4. Invoke ScanRule, produce physical scans, return control back to
>> ProjectRule
>> 5. Produce the physical project, finish optimization
>>
>> Now we have only 2 rule invocations as expected, and we reached the same
>> result as in the proposed minor changes. But the crucial difference is
>> that
>> now we have well-defined control flow between operators: start at the top,
>> delegate to children. With this infrastructure in place, we will be able
>> to
>> introduce more complex features, such as pruning, or partial exploration
>> later on.
>>
>> But notice that this change will be incompatible with the current rules,
>> i.e. they should be re-written for the new optimization algorithm (e.g.
>> see
>> step 3), which might be a blocker for the current Calcite users. So maybe
>> it is better to think of a new optimizer, rather than fixing
>> VolcanoOptimizer.
>>
>> Regards,
>> Vladimir.
>>
>>
>> вт, 14 янв. 2020 г. в 23:52, Haisheng Yuan <[email protected]>:
>>
>> > On the other hand, if we don't preprocess and normalize the rel
>> expression
>> > before going to valcano planner, still compute and keep
>> logical/relational
>> > properties, like cardinality, for each operator, how can we achieve
>> group
>> > seach space pruning? Given a physical group expression, its required
>> > property and upper bound cost C_limit, we need to get its child group
>> > cardinality, compute local cost C_local, so that we can calculate the
>> > child group's upper bound cost and pass it down.
>> >
>> > I can't figure out how we can do group pruning without shared relational
>> > properties.
>> >
>> > - Haisheng
>> >
>> > ------------------------------------------------------------------
>> > 发件人：Haisheng Yuan<[email protected]>
>> > 日 期：2020年01月14日 12:06:17
>> > 收件人：[email protected]<[email protected]>
>> > 主 题：Re: Re: [DISCUSS] Proposal to add API to force rules matching
>> specific
>> > rels
>> >
>> > The example is valid if Calcite doesn't do normalization or
>> preprocessing
>> > before going to VolcanoPlanner.
>> > But many databases and big data systems will try to preprocess the
>> > expression (push down predicates etc.) so that expressions in the same
>> > group can share the logical properties, for most case if not all. You
>> may
>> > argue that it should be cost based, e.g. evaluating filter early can
>> still
>> > be bad. It is true, but how accurate will the statistics be, how
>> accurate
>> > will the cost model be?
>> >
>> > - Haisheng
>> >
>> > ------------------------------------------------------------------
>> > 发件人：Julian Hyde<[email protected]>
>> > 日 期：2020年01月13日 08:44:54
>> > 收件人：[email protected]<[email protected]>
>> > 主 题：Re: [DISCUSS] Proposal to add API to force rules matching specific
>> rels
>> >
>> > > MEMO group (RelSet) represents logically equivalent expressions.
>> > > All the expressions in one group should share the same logical
>> > > properties, e.g. functional dependency, constraint properties etc.
>> > > But they are not sharing it. Computation is done for each individual
>> > operator.
>> >
>> > It's good, and correct, that we compute for each individual operator.
>> >
>> > Suppose that a RelSubset s contains RelNode r1 and we know that the
>> > constraint x > 0 holds. Suppose that we also have r2 with constraint y
>> > < 10, and we discover that r1 and r2 are equivalent and belong
>> > together in s. Now we can safely say that both constraints (x > 0 and
>> > y < 10) apply to both r1 and r2.
>> >
>> > Deducing additional constraints in this way is a big win. The effort
>> > to compute constraints for each RelNode is well-spent.
>> >
>> > This kind of deduction applies to other logical properties (e.g.
>> > unique keys) and it applies to RelSet as well as RelSubset.
>> >
>> > Julian
>> >
>> >
>> > On Sun, Jan 12, 2020 at 10:10 AM Roman Kondakov
>> > <[email protected]> wrote:
>> > >
>> > > @Haisheng
>> > >
>> > > > Calcite uses Project operator and all kinds of
>> ProjectXXXTranposeRule
>> > to prune unused columns.
>> > >
>> > > I also noticed that in most cases Project-related rules took
>> significant
>> > > part of the planning time. But I didn't explore this problem yet.
>> > >
>> > > > MEMO group (RelSet) represents logically equivalent expressions. All
>> > the expressions in one group should share the same logical properties,
>> e.g.
>> > functional dependency, constraint properties etc. But they are not
>> sharing
>> > it. Computation is done for each individual operator.
>> > >
>> > > I thought the equivalence of logical properties within groups
>> (RelSets)
>> > > are implicit. For example, in RelSet#addInternal it is always verified
>> > > that the new added node has the same type as other members of the set.
>> > >
>> > > Anyway I absolutely agree with you that problems with traits
>> > > propagation, rules matching and other that you mentioned in the
>> previous
>> > > e-mails should be solved in the first place. We need first to make
>> > > Volcano optimizer work right and only then make some improvements like
>> > > search space pruning.
>> > >
>> > > I would love to join this work to improve Volcano planner. Looking
>> > > forward for design doc.
>> > >
>> > >
>> > > --
>> > > Kind Regards
>> > > Roman Kondakov
>> > >
>> > >
>> > > On 11.01.2020 11:42, Haisheng Yuan wrote:
>> > > > Currently, Calcite uses Project operator and all kinds of
>> > ProjectXXXTranposeRule to prune unused columns. Every operator's output
>> > columns use index to reference child operators' columns. If there is a
>> > Project operator with child operator of a Filter, if we push project
>> down
>> > under Filter, we will have Project-Filter-Project-FilterInput. All the
>> > newly generated relnodes will trigger rule matches. e.g. If we already
>> did
>> > ReduceExpressionRule on filter, but due to the new filter RexCall's
>> input
>> > ref index changed, we have to apply ReduceExpressionRule on the new
>> filter
>> > again, even there is nothing can be reduced. Similarly new operator
>> > transpose/merge rule will be triggered. This can trigger a lot of rule
>> > matches.
>> > > >
>> > > > MEMO group (RelSet) represents logically equivalent expressions. All
>> > the expressions in one group should share the same logical properties,
>> e.g.
>> > functional dependency, constraint properties etc. But they are not
>> sharing
>> > it. Computation is done for each individual operator.
>> > > >
>> > > > Without resolving those issue, space pruning won't help much.
>> > > >
>> > > > There are a lot of room for improvement. Hope the community can join
>> > the effort to make Calcite better.
>> > > >
>> > > > - Haisheng
>> > > >
>> > > > ------------------------------------------------------------------
>> > > > 发件人：Roman Kondakov<[email protected]>
>> > > > 日 期：2020年01月10日 19:39:51
>> > > > 收件人：<[email protected]>
>> > > > 主 题：Re: [DISCUSS] Proposal to add API to force rules matching
>> specific
>> > rels
>> > > >
>> > > > @Haisheng, could you please clarify what you mean by these points?
>> > > >
>> > > >> - the poor-design of column pruning,
>> > > >> - lack of group properties etc.
>> > > >
>> > > > I guess I'm not aware of these problems.
>> > > >
>> >
>> >
>>
>>
>

Re: Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

Reply via email to