Hi Haisheng, Thank you for pointing to 1.23.0 changes, I'll try to use this version.
Regards, Vladimir. вс, 15 мар. 2020 г. в 08:05, Haisheng Yuan <[email protected]>: > Thanks for your detailed explanation, Vladimir. > > Indeed, the only way to propagate traits in Calcite currently is using > rules, which is a big pain. I can feel your pain. I tried to come up ways > to implement the trait derivation and requirement in the framwork without > breaking current usages, only turns out it is almost impossible. It has too > many stakeholders, even a small change may incur opposition. > > But before we get the real top-down cascades framework, there are still a > lot you can do to improve your planner's performance. > > Since Calcite 1.22.0, I committed a change that enabes RelSubset to be > used to trigger a rule, which can greatly reduce the number of rule calls > for trait propagation. With your example, you need 2 rules: > 1. Physical implementation rule > match pattern: operand(LogicalProject.class) > Produce PhysicalProject without trait > 2. Project trait propagtion rule > match pattern: operand(PhysicalProject.class, operand(RelSubset.class)) > Produce PhysicalProject with derived trait. > > Since 1.23.0, we removed the rule match importances and ordering, I guess > the can reduce the optimizatino time around 10~20% for some complex queries > with many rule calls. > > - Haisheng > > ------------------------------------------------------------------ > 发件人:Vladimir Ozerov<[email protected]> > 日 期:2020年03月15日 04:18:53 > 收件人:Haisheng Yuan<[email protected]> > 抄 送:[email protected] ([email protected])<[email protected] > > > 主 题:Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching > specific rels > > Hi Haisheng, > > You are right, the behavior I showed is not the default one, I should > provide more context here. This is how we (Hazelcast) and at least Drill > use the engine to ensure that the produced plan is optimal, I gave an > example in [1]. > In real distributed engines, we rely on physical properties heavily. Most > notably, distribution and collation. And the fundamental problem with the > VolcanoOptimizer, is that it cannot propagate traits in a controlled > manner. This, in turn, forces us to use AbstractConverters and implement > rules in ways, which appear strange to Calcite community :-). And this, in > turn, leads to excessive rule calls and failure to plan complex queries. > > Let's consider the same tree again, but now assuming that this is not the > complete tree, but a subtree, and there are some parent operators. Let's > also assume that the ScanRule may produce two equivalent operators with > different physical properties: PhysicalScan and PhysicalIndexScan[a ASC]. > It is important to consider both alternatives in parent operators. Now > let's consider two different ways to optimize that subtree. > > 1. Canonical Calcite way (default) > 1.1 Perform initial rules ordering, parent rules fire first: [ProjectRule, > ScanRule] > 1.2 Invoke ProjectRule, which produces physical project without any > physical traits > 1.3 Invoke ScanRule, which produces, PhysicalScan and PhysicalIndexScan[a > ASC] > Since ProjectRule was not aware of different scan alternatives, it missed > collation, and resulting hypergraph looks like this: > > -- PhysicalProject > ---- [PhysicalScan, PhysicalIndexScan[a ASC]] > > This plan is suboptimal, because of parent operators cannot take advantage > of collation. > > 2. Hazelast/Drill way: > 2.1 Enable abstract converters > 2.2 Rules are ordered in the same way as in example 1: [ProjectRule, > ScanRule] > 2.3 ProjectRule fires, enumerates physical implementations of the input. > Since there are no physical inputs yet, it exits without any > transformations > 2.4 ScanRule fires and produces two physical scans > 2.5 Abstract converters ensure that the ProjectRule is scheduled for > execution again because it's input RelSet has new nodes > 2.6 ProjectRule fires again, now having physical inputs, and generates > one implementation per unique combination of physical input traits. > > As a result, we have two graphs now: > > Graph 1: > -- PhysicalProject > ---- PhysicalScan > > Graph 2: > -- PhysicalProject[a ASC] > ---- PhysicalIndexScan[a ASC] > > Notice how we propagated physical collation. Now parent operators may take > advantage of it, e.g. eliminate sorting, or perform streaming aggregation > instead of blocking hash aggregation. > > This is the fundamental problem we have in Hazelcast: how to ensure the > complete exploration of the search space without excessive rule calls. > > Very short summary: > 1. The default behavior of VolcanoOptimizer cannot explore the physical > search space, so plans are not optimal > 2. Abstract converters fix this if you follow a certain pattern in rule > implementations (see 2.3), but generate too many rule calls, so join > planning rules cannot be called together with other rules, which again lead > to not optimal plans (yet, better than with p.1) > 3. "Trait pull-up" proposal may fix it. But I have a feeling that pulling > up possible trait combinations from a child node is indistinguishable from > child node exploration, so it may be not very efficient again > 4. A brand new optimizer implementation with recursive top-down approach > may address all the concerns from p.1-p.3, but appears to be complex to > implement and may be incompatible with existing rules > > Not an easy choice. > > Regards, > Vladimir. > > [1] > https://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3CCAJJmzpU9_75O48WeEKgHKg3fTMhK3eAMmHNOVvczj_uUTBxHkA%40mail.gmail.com%3E > > сб, 14 мар. 2020 г. в 21:53, Haisheng Yuan <[email protected]>: > >> I agree that there should be no global rule queue, we should it do it on >> per-operator basis, which is also how other major Cascades frameworks do. >> >> However, Calcite's VolcanoPlanner doesn't generate unnecessary rule calls >> as you described. The current process is: >> 1. global rule queue: ScanRule, ProjectRule >> 2. Call ScanRule, produce physical scan >> 3. Call ProjectRule, produce physical project. >> >> Even with global rule queue of reversed order ProjectRule, ScanRule, >> there are still 2 rule calls. In your step 2, ProjectRule doesn't produce >> physical node, which is incorrect. Any rule is, and should be independent >> with each other rule. If your rule relies on other operators or rules to be >> explored first, then you should think about it twice. >> >> - Haisheng >> >> ------------------------------------------------------------------ >> 发件人:Vladimir Ozerov<[email protected]> >> 日 期:2020年03月15日 01:50:10 >> 收件人:[email protected] ([email protected])< >> [email protected]> >> 主 题:Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching >> specific rels >> >> Hi Roman, >> >> In my understanding, the proposed minor changes may only decrease the >> total >> number of rule invocations slightly, but all principal problems remain the >> same. In the top-down approach, you do not want to implement bottom >> logical >> nodes unless it is requested explicitly by a parent operator. >> >> It seems to me that the very first step to efficient optimizer could be a >> new rule invocation infrastructure. There should be *no global rule queue* >> at all. Instead, we may introduce the per-node rule queue. Then, the >> optimizer performs a recursive top-down optimization dive, invoking the >> rules for every operator. Consider the following simple tree: >> -- LogicalProject >> ---- LogicalScan >> >> Assuming that we have two implementation rules ProjectRule, ScanRule, and >> abstract converters enabled, VolcanoOptimizer will proceed as follows, >> generating one unnecessary rule call: >> 1. Define global rule call queue: ProjectRule, ScanRule >> 2. Call ProjectRule, no new nodes are produced >> 3. Call ScanRule, produce physical scans, reschedule ProjectRule >> 4. Call ProjectRule again, produce the physical project >> >> With the proposed approach, it will work differently: >> 1. Define per-operator queues: >> LogicalProject -> ProjectRule >> LogicalScan -> ScanRule >> 2. Call optimize(LogicalProject) >> 3. Invoke ProjectRule, which calls optimize(LogicalScan) on the input >> 4. Invoke ScanRule, produce physical scans, return control back to >> ProjectRule >> 5. Produce the physical project, finish optimization >> >> Now we have only 2 rule invocations as expected, and we reached the same >> result as in the proposed minor changes. But the crucial difference is >> that >> now we have well-defined control flow between operators: start at the top, >> delegate to children. With this infrastructure in place, we will be able >> to >> introduce more complex features, such as pruning, or partial exploration >> later on. >> >> But notice that this change will be incompatible with the current rules, >> i.e. they should be re-written for the new optimization algorithm (e.g. >> see >> step 3), which might be a blocker for the current Calcite users. So maybe >> it is better to think of a new optimizer, rather than fixing >> VolcanoOptimizer. >> >> Regards, >> Vladimir. >> >> >> вт, 14 янв. 2020 г. в 23:52, Haisheng Yuan <[email protected]>: >> >> > On the other hand, if we don't preprocess and normalize the rel >> expression >> > before going to valcano planner, still compute and keep >> logical/relational >> > properties, like cardinality, for each operator, how can we achieve >> group >> > seach space pruning? Given a physical group expression, its required >> > property and upper bound cost C_limit, we need to get its child group >> > cardinality, compute local cost C_local, so that we can calculate the >> > child group's upper bound cost and pass it down. >> > >> > I can't figure out how we can do group pruning without shared relational >> > properties. >> > >> > - Haisheng >> > >> > ------------------------------------------------------------------ >> > 发件人:Haisheng Yuan<[email protected]> >> > 日 期:2020年01月14日 12:06:17 >> > 收件人:[email protected]<[email protected]> >> > 主 题:Re: Re: [DISCUSS] Proposal to add API to force rules matching >> specific >> > rels >> > >> > The example is valid if Calcite doesn't do normalization or >> preprocessing >> > before going to VolcanoPlanner. >> > But many databases and big data systems will try to preprocess the >> > expression (push down predicates etc.) so that expressions in the same >> > group can share the logical properties, for most case if not all. You >> may >> > argue that it should be cost based, e.g. evaluating filter early can >> still >> > be bad. It is true, but how accurate will the statistics be, how >> accurate >> > will the cost model be? >> > >> > - Haisheng >> > >> > ------------------------------------------------------------------ >> > 发件人:Julian Hyde<[email protected]> >> > 日 期:2020年01月13日 08:44:54 >> > 收件人:[email protected]<[email protected]> >> > 主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific >> rels >> > >> > > MEMO group (RelSet) represents logically equivalent expressions. >> > > All the expressions in one group should share the same logical >> > > properties, e.g. functional dependency, constraint properties etc. >> > > But they are not sharing it. Computation is done for each individual >> > operator. >> > >> > It's good, and correct, that we compute for each individual operator. >> > >> > Suppose that a RelSubset s contains RelNode r1 and we know that the >> > constraint x > 0 holds. Suppose that we also have r2 with constraint y >> > < 10, and we discover that r1 and r2 are equivalent and belong >> > together in s. Now we can safely say that both constraints (x > 0 and >> > y < 10) apply to both r1 and r2. >> > >> > Deducing additional constraints in this way is a big win. The effort >> > to compute constraints for each RelNode is well-spent. >> > >> > This kind of deduction applies to other logical properties (e.g. >> > unique keys) and it applies to RelSet as well as RelSubset. >> > >> > Julian >> > >> > >> > On Sun, Jan 12, 2020 at 10:10 AM Roman Kondakov >> > <[email protected]> wrote: >> > > >> > > @Haisheng >> > > >> > > > Calcite uses Project operator and all kinds of >> ProjectXXXTranposeRule >> > to prune unused columns. >> > > >> > > I also noticed that in most cases Project-related rules took >> significant >> > > part of the planning time. But I didn't explore this problem yet. >> > > >> > > > MEMO group (RelSet) represents logically equivalent expressions. All >> > the expressions in one group should share the same logical properties, >> e.g. >> > functional dependency, constraint properties etc. But they are not >> sharing >> > it. Computation is done for each individual operator. >> > > >> > > I thought the equivalence of logical properties within groups >> (RelSets) >> > > are implicit. For example, in RelSet#addInternal it is always verified >> > > that the new added node has the same type as other members of the set. >> > > >> > > Anyway I absolutely agree with you that problems with traits >> > > propagation, rules matching and other that you mentioned in the >> previous >> > > e-mails should be solved in the first place. We need first to make >> > > Volcano optimizer work right and only then make some improvements like >> > > search space pruning. >> > > >> > > I would love to join this work to improve Volcano planner. Looking >> > > forward for design doc. >> > > >> > > >> > > -- >> > > Kind Regards >> > > Roman Kondakov >> > > >> > > >> > > On 11.01.2020 11:42, Haisheng Yuan wrote: >> > > > Currently, Calcite uses Project operator and all kinds of >> > ProjectXXXTranposeRule to prune unused columns. Every operator's output >> > columns use index to reference child operators' columns. If there is a >> > Project operator with child operator of a Filter, if we push project >> down >> > under Filter, we will have Project-Filter-Project-FilterInput. All the >> > newly generated relnodes will trigger rule matches. e.g. If we already >> did >> > ReduceExpressionRule on filter, but due to the new filter RexCall's >> input >> > ref index changed, we have to apply ReduceExpressionRule on the new >> filter >> > again, even there is nothing can be reduced. Similarly new operator >> > transpose/merge rule will be triggered. This can trigger a lot of rule >> > matches. >> > > > >> > > > MEMO group (RelSet) represents logically equivalent expressions. All >> > the expressions in one group should share the same logical properties, >> e.g. >> > functional dependency, constraint properties etc. But they are not >> sharing >> > it. Computation is done for each individual operator. >> > > > >> > > > Without resolving those issue, space pruning won't help much. >> > > > >> > > > There are a lot of room for improvement. Hope the community can join >> > the effort to make Calcite better. >> > > > >> > > > - Haisheng >> > > > >> > > > ------------------------------------------------------------------ >> > > > 发件人:Roman Kondakov<[email protected]> >> > > > 日 期:2020年01月10日 19:39:51 >> > > > 收件人:<[email protected]> >> > > > 主 题:Re: [DISCUSS] Proposal to add API to force rules matching >> specific >> > rels >> > > > >> > > > @Haisheng, could you please clarify what you mean by these points? >> > > > >> > > >> - the poor-design of column pruning, >> > > >> - lack of group properties etc. >> > > > >> > > > I guess I'm not aware of these problems. >> > > > >> > >> > >> >> >
