Hi everyone! Haisheng, thank you for bringing this subject up. A new Cascades-style optimizer should be definitely the next step for Apache Calcite. Many projects suffer from the lack of this kind of optimizer.
That was the reason why several weeks ago I started working on the prototype of Cascades optimizer for Apache Calcite. I was not sure that I would build something useful without much experience in this area. But now I'd like to share my work with the community. You can find the Cascades prototype in PR [1]. This prototype is based on the well-known paper [2]. What prototype can do: - Top-down trait request - Convert traits without Abstract converters - Top-down rule apply - Bottom-up trait derivation What is not supported yet: - Search space pruning (but I'm going to fix it in the next commits) - Materialized views - Planner hooks - Hints - Backward compatibility with Volcano planner (some research needed) I prepared a design doc for this planner [3], you can find many details there. I also opened it for comments. I've written several basic test cases in org.apache.calcite.plan.cascades.CascadesPlannerTest including that was discussed in this thread previously in the context of trait requests: > MergeJoin hash[a] > |---- TableScan R hash[a] (RelSubset) > +---- TableScan S hash[a] (RelSubset) Haisheng, this is very similar to what you propose. Answering your question: > There are 2 ways: > a) Modify on current VolcanoPlanner. > Pros: code reuse, existing numerous test cases and infrastructure, fast > integration > Cons: changing code always brings risk > > b) Add a new XXXXPlanner > Pros: no risk, no tech debt, no need to worry about backward compatability > Cons: separate test cases for new planner, one more planner to maintain I've chosen the second approach. Because I don't have clear understanding how to fix Volcano planner gradually. This new planner is very far from perfect, but I think it can be a good starting point for community. Please, share your thoughts about this planner. [1] PR: https://github.com/apache/calcite/pull/1948 [2] Paper: https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf [3] Design doc: https://docs.google.com/document/d/1qaV3eSKTw4gfLuBR3XB_LVIc51hxIng9k1J4VD6qnRg/edit?usp=sharing -- Kind Regards Roman Kondakov On 22.04.2020 09:52, Danny Chan wrote: >> Is there any recommended approach to make that happen smoothly besides > coding and testing work? We need to be aware that the new planner might be > co-exist with VolcanoPlanner for 5 or more years, or even never replace > VolcanoPlanner. > > If that is true, i might say the new planner is probably with a not that > good design, we expect to see in advance for what cases/reasons user has > the reason to keep the old VolcanoPlanner and we *must* give a solution for > those problems in the new design. > > I was expecting that migrating to a new planner would at least take 1 year > for developing, if that is true, modifying directly based on current > planner means for the near future 3~4 versions Calcite, there would bring > in huge plan changes/bugs for each release which i believe all the users of > Calcite don't want to see. And on one can guarantee that modifying directly > can keep good stability and compatibility, only the test set do. > > From the experience of Alibaba Blink planner which has contributed to > Apache Flink, yes, the old/new planner would co-exist at least for 2 years. > For the reasons that the new and old planner has different ability in some > corner cases. > > From my point of view, we should at least: > - Give a convincing test set for the new planner that makes us believe the > new planner is stable and powerful enough. I mean obviously the current > rule tests are far away from enough to support the new planner > - We should give a more detailed design doc about the new planner, > especially about the interfaces changes and any change that would bring in > the compatibility problem. Then we can make more accurate decision how much > work the new planner would bring in, until then, we can decide if switch to > a pure new planner development is a good idea or modify the existing one. > > > Haisheng Yuan <hy...@apache.org> 于2020年4月22日周三 上午9:45写道: > >> Hi Andrii, >> >>> Obviously, from what is written here, I could guess that this would >> require me to change my physical planning rules, even if only by >> implementing a marker interface. >> You don't need to change your physical rules, it will be treated as equal >> as logical rules and be applied together with the real logical rules, no >> more logical/physical rules difference. This is also how current >> VolcanoPlanner works. >> >>> I don't want you to think that I somehow resent the changes you are >> pushing. >> Don't get me wrong. I am seriously thinking of revert these changes, since >> most people like the idea of adding new planner, why don't we make all the >> plan changes in the new planner, instead of forcing people changing test >> cases for the code changes that they might not need in VolcanoPlanner >> during upgrade. >> >> I didn't intend to replace VolcanoPlanner, thought just change the search >> strategy and add trait derivation mechanism, because most of the code in >> VolcanoPlanner can be reused. But since many agree to add new planner and >> replace VolcanoPlanner as the final goal, I won't be against most people's >> decision. >> >> Is there any recommended approach to make that happen smoothly besides >> coding and testing work? We need to be aware that the new planner might be >> co-exist with VolcanoPlanner for 5 or more years, or even never replace >> VolcanoPlanner. >> >> More thoughts are welcome. >> >> Haisheng >> >> On 2020/04/21 19:56:25, Андрей Цвелодуб <a.tsvelo...@gmail.com> wrote: >>> Hello Haisheng, >>> >>>> To keep backward compatibility, all the un-marked rules will be treated >>> as logical rules, except rules that uses AbstractConverter as rule >> operand, >>> these rules still need to applied top-down, or random order. >>> Obviously, from what is written here, I could guess that this would >> require >>> me to change my physical planning rules, even if only by implementing a >>> marker interface. I am not saying this is a bad thing, but this is a >> thing >>> that should be communicated and planned ahead in case the VolcanoPlanner >> is >>> modified. >>> >>>> Looks like I have to revert changes in CALCITE-2970 and CALCITE-3753, >>> because they will cause another tons of plan changes. >>> I see you are still bitter due to all the discussions on this list >> lately, >>> I'm sorry. I don't want you to think that I somehow resent the changes >> you >>> are pushing, au contraire I support them and would be happy to help if I >>> can. I just want the process of these changes to be executed in the best >>> possible way. >>> As I see there are already several opinions in this thread that basically >>> align with what I am saying, so I guess I am not the crazy guy running >>> around and yelling "the end is nigh!". >>> >>> Thank you for taking these mumbled thoughts into account. >>> >>> Bestest Regards, >>> Andrii Tsvielodub >>> >>> On Tue, 21 Apr 2020 at 21:08, Haisheng Yuan <hy...@apache.org> wrote: >>> >>>> Hi Andrii, >>>> >>>>> I guess changing the planner would lead to changes in tons of rules >> and >>>> even more tests. >>>> Obviously you didn't read through my email. You are not required to do >> any >>>> changes to your rule if you don't want to, but if you do, just need to >> mark >>>> the rule to tell planner whether it is a physical rule or not, simply >> by >>>> implementing an empty interface. >>>> >>>>> many on this list already experienced problems with upgrading even >>>> between the minor versions of Calcite. >>>> Sorry to see the problem you have experienced when upgrading Calcite. >>>> Looks like I have to revert changes in CALCITE-2970 and CALCITE-3753, >>>> because they will cause another tons of plan changes. >>>> >>>> But I will see if I can add a setting to use the old search strategy, >>>> which can be left untouched. >>>> >>>> Haisheng >>>> >>>> On 2020/04/21 06:33:08, Андрей Цвелодуб <a.tsvelo...@gmail.com> wrote: >>>>> Hello everyone, >>>>> >>>>> First of all, thanks for this great effort of improving the core >> parts of >>>>> the framework we all are using, >>>>> I believe this is long overdue and hope this will have benefits both >> for >>>>> the maintainers and users of the library. >>>>> >>>>> I don't have anything to say about the general idea at the moment, >>>>> but I want to make a point that maintaining the old implementation of >>>>> VolcanoPlanner during >>>>> the initial stages of implementing the new planner is absolutely >>>> CRITICAL. >>>>> As a lot of users of Calcite do various customizations to the >> engine, to >>>>> the rules >>>>> and all that is there in between, I believe changing the >> implementation >>>> of >>>>> the core component >>>>> would have a huge impact on most users of the library. I think many >> on >>>> this >>>>> list >>>>> already experienced problems with upgrading even between the minor >>>> versions >>>>> of Calcite, >>>>> so I guess changing the planner would lead to changes in tons of >> rules >>>> and >>>>> even more tests. >>>>> >>>>> I don't have anything against replacing VolcanoPlanner as a final >> goal of >>>>> this effort, >>>>> but I don't think that modifying it directly and merging it to >> master is >>>> a >>>>> viable development approach. >>>>> While I understand how burdensome it is to maintain several parallel >> core >>>>> components at once >>>>> (we did this while moving the engine of our product to Calcite), we >>>> should >>>>> still respect those who depend >>>>> on it and not introduce the risks related to the development of a new >>>>> component into existing processing flows. >>>>> >>>>> A good model to try to follow would be the way new Garbage >> Collectors are >>>>> introduced in Java. >>>>> First, add it as an experimental option, then make it generally >>>> available, >>>>> then after everyone agrees >>>>> this is the best option - make it the default one. >>>>> With this approach, everyone can then move to the new planner at >> their >>>> own >>>>> pace, guaranteeing a smooth transition overall. >>>>> Yes, this could take some time, maybe even a year, but this is the >> price >>>> of >>>>> doing major changes in a popular framework. >>>>> >>>>> Again, thank you for initiating this discussion and leading this >> effort. >>>>> >>>>> Best Regards, >>>>> Andrii Tsvielodub >>>>> >>>>> On Tue, 21 Apr 2020 at 07:51, Jinpeng Wu <wjpabc...@gmail.com> >> wrote: >>>>> >>>>>> Hi, Xiening. >>>>>> >>>>>> Regarding calculating the logical cost, here are some ways I >> though: >>>>>> 1. Logical rel may implement their own computeSelfCost method. Some >>>>>> rels can provide such information, for example the >>>>>> LogicalProject/LogicalFilter contains nearly the same information >> as >>>> their >>>>>> physical implementations. If we don't have enough confidence, just >>>> return >>>>>> zeroCost is also OK, as it only affects pruning. >>>>>> 2. Logical rel tells its parents what its physical input could be >>>> after >>>>>> implementation. Then the problem come back to calculating lower >> bound >>>> of a >>>>>> physical rel. >>>>>> There should always be ways. The only problem is how to find a >> pretty >>>> one. >>>>>> >>>>>> Regarding the risk, new planner do have different risk. It is not >>>> because >>>>>> new planner could stop us doing something wrong but we can decide >> when >>>> to >>>>>> use the new one. Some scenarios: >>>>>> 1. If modifying the VolcanoPlanner directly, the only way user >> could >>>>>> control the risk is not to upgrade calcite version until it is >>>> considered >>>>>> stable. You know, it is quite different from keeping calcite >> updated >>>> and >>>>>> switching to the new planner at a proper time. >>>>>> 2. It is very importance for SLA control. For the important >> business >>>> and >>>>>> jobs, we may keep using the old and stable planner. And use the >> new one >>>>>> only for jobs that have fault tolerance. And this helps testing new >>>> planner >>>>>> with actual scenarios. >>>>>> 3. It is helpful when upgrading online services. When the new >> planner >>>>>> happened to have some bugs, we can switch to the old planner >> directly >>>>>> without rollback the whole service. >>>>>> 4. With all these ways to prevent issues becoming disasters, we >> are not >>>>>> vulnerable to making mistakes. This not only enables faster >> iterations >>>> but >>>>>> also let us have enough time to resolve big bugs, like considering >> it >>>> in >>>>>> detail and applying a time-consuming refactoring for it. To work >>>> around a >>>>>> critical bug using tricky ways usually introduces more issues. >>>>>> >>>>>> Thanks, >>>>>> Jinpeng >>>>>> >>>>>> On Tue, Apr 21, 2020 at 2:04 AM Xiening Dai <xndai....@gmail.com> >>>> wrote: >>>>>> >>>>>>> Hi Jinpeng, >>>>>>> >>>>>>> Regarding this comment - I believe there are ways to calculate >> the >>>>>> logical >>>>>>> cost, but I think it’s not that simple as "cardinality * >>>>>> unit_copy_cost.”, >>>>>>> would you provide more details of other different ways? Just the >>>>>> algorithm >>>>>>> description or pseudo code would help us understand. Thanks. >>>>>>> >>>>>>> Regarding the approach of creating new planner, I don’t think a >> new >>>>>>> planner would lower the risk. We don’t know what we don’t know. >> If we >>>>>>> introduced an issue while modifying the planner, most likely we >>>> would do >>>>>>> the same with new planner class. A new planner doesn’t >> necessarily >>>>>> prevent >>>>>>> the issue from happening, but just delay its surfacing, which is >>>> worse >>>>>> IMO. >>>>>>> >>>>>>> There’s one obvious benefit with new planner is that we can >> provide >>>> some >>>>>>> sort of isolation so the change won’t cause test baseline >> updates, >>>> which >>>>>>> could be painful at times. We should see if we could use planner >>>> config >>>>>> to >>>>>>> achieve the same if we decide to just modify the current planner. >>>>>>> >>>>>>> >>>>>>>> On Apr 20, 2020, at 8:32 AM, Haisheng Yuan <hy...@apache.org> >>>> wrote: >>>>>>>> >>>>>>>> Igor, >>>>>>>> >>>>>>>> That's great. >>>>>>>> >>>>>>>> On 2020/04/20 11:17:49, Seliverstov Igor <gvvinbl...@gmail.com >>> >>>> wrote: >>>>>>>>> Haisheng, Xiening, >>>>>>>>> >>>>>>>>> Ok, Now I see how it should work. >>>>>>>>> >>>>>>>>> Thanks for your replies. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Igor >>>>>>>>> >>>>>>>>>> 20 апр. 2020 г., в 09:56, Seliverstov Igor < >> gvvinbl...@gmail.com >>>>> >>>>>>> написал(а): >>>>>>>>>> >>>>>>>>>> Haisheng, Xiening, >>>>>>>>>> >>>>>>>>>> Thanks for clarifying. >>>>>>>>>> >>>>>>>>>> In this proposal, we are not trying to split logical and >> physical >>>>>>> planning entirely. - actually I was in doubt about an idea of >> entire >>>>>>> splitting logical and physical phases, if you aren't going to, I >>>> have no >>>>>>> objections. >>>>>>>>>> >>>>>>>>>> But it returns me to my first question: how we will propagate >>>> traits >>>>>>> in bottom-up manner using proposed approach. (See [DISCUSS] >> Proposal >>>> to >>>>>> add >>>>>>> API to force rules matching specific rels for details) >>>>>>>>>> >>>>>>>>>> One of inconveniences of current VolcanoPlanner >> implementation is >>>>>>> amount of tricks that we need to get desired behaviour. It would >> be >>>> great >>>>>>> if some of issues (or all of them) were solved in the new >> approach. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Igor >>>>>>>>>> >>>>>>>>>> пн, 20 апр. 2020 г., 7:02 Xiening Dai <xndai....@gmail.com >>>> <mailto: >>>>>>> xndai....@gmail.com>>: >>>>>>>>>> Hi Igor, >>>>>>>>>> >>>>>>>>>> Your comment - "because actual cost may be calculated >> correctly >>>> using >>>>>>> physical operators only. So won't be able to implement Branch and >>>> Bound >>>>>>> Space Pruning.“ is actually not true. In Cascade’s lower bound / >>>> upper >>>>>>> bound pruning algorithm, you can get cost lower bound of input >>>> RelNode >>>>>>> using cardinality * unit_copy_cost. The unit_copy_cost is a >> constant, >>>>>> which >>>>>>> stands for the minimal cost for processing a tuple from the >> input. So >>>>>> this >>>>>>> will be the minimal cost the input RelNode can achieve, and if >> this >>>> is >>>>>>> indeed larger than the current best cost, this input node can be >>>> pruned. >>>>>>>>>> >>>>>>>>>> In this proposal, we are not trying to split logical and >> physical >>>>>>> planning entirely. But for any given equivalent set, we would >> need to >>>>>>> finish exploration before implementation. But for the entire memo >>>> tree, >>>>>>> each set could be in different planning stage, and an equivalent >> set >>>> can >>>>>> be >>>>>>> pruned even before it’s implemented. A text book example of >>>>>> aforementioned >>>>>>> “lower bound / upper bound pruning” would be the join ordering >> case. >>>>>>>>>> >>>>>>>>>> Regarding #3, I think we can still achieve that (partially) >>>> through >>>>>>> this proposal. Remember every time when we start the >> optimization, we >>>>>> pass >>>>>>> down an upper bound limit. Initially this upper bound for root is >>>>>> infinite >>>>>>> - meaning that no feasible plan is available. Every time when we >>>> find a >>>>>>> physical plan we update this upper bound, then start the search >>>> again. We >>>>>>> could stop the search when the cost is less than a pre-defined >>>> threshold >>>>>> - >>>>>>> which gives you a “good enough” plan with early termination. >> Still >>>> this >>>>>>> wouldn't avoid the logical exploration. For that, you would >> probably >>>>>>> archive through rule configurations, and avoid some expansive >>>>>>> transformation to keep the cost down. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Apr 19, 2020, at 7:30 PM, Haisheng Yuan < >> hy...@apache.org >>>>>> <mailto: >>>>>>> hy...@apache.org>> wrote: >>>>>>>>>>> >>>>>>>>>>> Igor, >>>>>>>>>>> >>>>>>>>>>> a) Given current Calcite's stats derivation strategy, mixing >>>> logical >>>>>>> and physical planning won't make it better. I hope you went >> through >>>> my >>>>>>> email to the end, currently operators inside a memo group don't >> share >>>>>> stats >>>>>>> info, each operator's stats may differ with the other one, and >> the >>>>>>> RelSubset's best node and stats may vary during optimization. So >>>> logical >>>>>>> and physical rule pruning are not safe at the moment, otherwise >> it >>>> almost >>>>>>> implies changing downstream project's cost model silently. >>>>>>>>>>> >>>>>>>>>>> On the other hand, ensuring child group exploration task >> finish >>>>>> first >>>>>>> will make rule mutual exclusivity check possible, like the >> result of >>>>>>> ReduceExpressionRule won't need trigger the same rule again, The >> join >>>>>>> result of JoinCommuteRule won't need trigger JoinCommuteRule and >>>>>>> ReduceExpressionRule again. >>>>>>>>>>> >>>>>>>>>>> More importantly, most if not all the the long planning >> queries >>>> in >>>>>>> Calcite are not caused by too many alternatives, but mutual >>>> triggering, >>>>>> or >>>>>>> cyclic triggering, which can be avoided by customizing the rules >>>> instead >>>>>> of >>>>>>> using the default one. Unless you use dynamic programming >> (Calcite >>>> use >>>>>>> heuristic) on join reordering (left-deep, right-deep, bushy), >> space >>>>>> pruning >>>>>>> won't help make long / infinite running query faster. >>>>>>>>>>> >>>>>>>>>>> b) No evidence shows current version of Calcite will return >> the >>>> most >>>>>>> promising plan in first planning iteration. Instead of praying >> for >>>>>> getting >>>>>>> good enough plan in the first iteration, why not focus on fixing >>>> rules >>>>>> that >>>>>>> causes the issue? >>>>>>>>>>> >>>>>>>>>>> c) That is not the goal. >>>>>>>>>>> >>>>>>>>>>> On 2020/04/19 15:14:57, Seliverstov Igor < >> gvvinbl...@gmail.com >>>>>>> <mailto:gvvinbl...@gmail.com>> wrote: >>>>>>>>>>>> Haisheng, >>>>>>>>>>>> >>>>>>>>>>>> From my point of view splitting logical and physical >> planning >>>> parts >>>>>>> isn’t a good idea. >>>>>>>>>>>> >>>>>>>>>>>> I think so because actual cost may be calculated correctly >>>> using >>>>>>> physical operators only. So that we: >>>>>>>>>>>> a) won't be able to implement Branch and Bound Space >> Pruning >>>> (as >>>>>> far >>>>>>> as I understand, at exploring time there are no physical >> operators, >>>> no >>>>>>> correct costs, but assumptions only, I don’t think we should >> rely on >>>>>> them) >>>>>>>>>>>> b) won’t be able to get good enough plan (without Branch >> and >>>> Bound >>>>>>> Space Pruning it’s non-trivial task to get right search >> direction and >>>>>> most >>>>>>> promising plans in first planning iterations) >>>>>>>>>>>> c) won’t be able to tune the good enough plan during >> several >>>>>> similar >>>>>>> queries are executed >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Igor >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 19 апр. 2020 г., в 17:37, Haisheng Yuan <hy...@apache.org >>>>>> <mailto: >>>>>>> hy...@apache.org>> написал(а): >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Igor, >>>>>>>>>>>>> >>>>>>>>>>>>> You can't have your cake and eat it. >>>>>>>>>>>>> But one feasible work item definitely we can do is that >> once >>>>>>> timeout, stop exploring, use the first available physical >> operator in >>>>>> each >>>>>>> group and optimize it. >>>>>>>>>>>>> >>>>>>>>>>>>> Because most, if not all, of the long / infinite running >>>>>>> optimizations are caused by project related transpose, join >>>> reordering >>>>>>> (join commute + associative), constant reduction for large >> expression >>>>>> (see >>>>>>> CALCITE-3460), all of which are logical transformations rules and >>>> many of >>>>>>> which have corresponding JIRAs. So another alternative is, let's >> fix >>>>>> these >>>>>>> bugs to improve Calcite to make timeout option less usable. >>>>>>>>>>>>> >>>>>>>>>>>>> Another thing worth noting is that sql server optimizer >>>> timeout >>>>>>> mechanism is not based on clock time, but the number of >> optimization >>>>>> tasks >>>>>>> it has done [1]. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>> >>>>>> >>>> >> https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188 >>>>>>> < >>>>>>> >>>>>> >>>> >> https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188 >>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Haisheng >>>>>>>>>>>>> >>>>>>>>>>>>> On 2020/04/19 11:31:27, Seliverstov Igor < >>>> gvvinbl...@gmail.com >>>>>>> <mailto:gvvinbl...@gmail.com>> wrote: >>>>>>>>>>>>>> Haisheng, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok, then such notification isn't needed >>>>>>>>>>>>>> >>>>>>>>>>>>>> But in this case we don't have any control over how long >>>> planning >>>>>>> takes >>>>>>>>>>>>>> >>>>>>>>>>>>>> For some systems it's necessary to get good enough plan >>>> right now >>>>>>> instead >>>>>>>>>>>>>> of best one after while >>>>>>>>>>>>>> >>>>>>>>>>>>>> For example we've been considering a case when a query is >>>>>>> optimised several >>>>>>>>>>>>>> times in short iterations in case it's impossible to get >> the >>>> best >>>>>>> plan in >>>>>>>>>>>>>> reasonable period of time (for example there is an SLA >> for >>>>>>> response time) >>>>>>>>>>>>>> >>>>>>>>>>>>>> This mean we need all needed physical implementations >> after >>>> each >>>>>>> logical >>>>>>>>>>>>>> transformation is applied. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Igor >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan < >> hy...@apache.org >>>>>>> <mailto:hy...@apache.org>>: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Igor, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There will be no rule queue anymore. Y will be fully >>>> explored >>>>>>> (logical >>>>>>>>>>>>>>> rules are matched and applied) before it can be >> implemented >>>> and >>>>>>> optimized. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Haisheng >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2020/04/19 10:11:45, Seliverstov Igor < >>>> gvvinbl...@gmail.com >>>>>>> <mailto:gvvinbl...@gmail.com>> wrote: >>>>>>>>>>>>>>>> Hi Haisheng, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Great explanation, thanks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> One thing I'd like to cover in advance is trait >> propagation >>>>>>> process (I >>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>> it for Apache Ignite SQL engine implementation). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For example: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> There are two nodes: Rel X and its child node Rel Y >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Both nodes are in Optimized state, and there is a >> Logical >>>> rule >>>>>>> for Rel Y >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>> a rules queue matched, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> After logical rule is applied, there is a logical rel >> Rel >>>> Y' >>>>>> and >>>>>>>>>>>>>>> containing >>>>>>>>>>>>>>>> it set moves into Exploring state (since there is a >> logical >>>>>> node >>>>>>> which >>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>> to be implemented) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> After whole process Exploring -> ... -> Optimized we >> need >>>> to >>>>>>> force parent >>>>>>>>>>>>>>>> Rel X set to switch its state from Optimized to >> Optimizing >>>> to >>>>>>> peek the >>>>>>>>>>>>>>>> newly implemented child and (possible) produce a new >> Rel X' >>>>>>> physical rel >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>> the basis of previously requested from the Rel X set >>>> traits. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If a new Rel X' is produced, a Rel X' parent should >> move >>>> its >>>>>>> state >>>>>>>>>>>>>>>> Optimized -> Optimizing and repeat described above >>>> operations. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does it look like true? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Igor >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> вс, 19 апр. 2020 г., 6:52 Haisheng Yuan < >>>>>> h.y...@alibaba-inc.com >>>>>>> <mailto:h.y...@alibaba-inc.com>>: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In the past few months, we have discussed a lot about >>>> Cascades >>>>>>> style >>>>>>>>>>>>>>>>> top-down optimization, including on-demand trait >>>>>>> derivation/request, >>>>>>>>>>>>>>> rule >>>>>>>>>>>>>>>>> apply, branch and bound space pruning. Now we think >> it is >>>> time >>>>>>> to move >>>>>>>>>>>>>>>>> towards these targets. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We will separate it into several small issues, and >> each >>>> one >>>>>> can >>>>>>> be >>>>>>>>>>>>>>>>> integrated as a standalone, independent feature, and >> most >>>>>>> importantly, >>>>>>>>>>>>>>>>> meanwhile keep backward compatibility. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. Top-down trait request >>>>>>>>>>>>>>>>> In other words, pass traits requirements from parent >>>> nodes to >>>>>>> child >>>>>>>>>>>>>>> nodes. >>>>>>>>>>>>>>>>> The trait requests happens after all the logical >>>>>> transformation >>>>>>> rules >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> physical implementation rules are done, in a top-down >>>> manner, >>>>>>> driven >>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>> root set. e.g.: >>>>>>>>>>>>>>>>> SELECT a, sum(c) FROM >>>>>>>>>>>>>>>>> (SELECT * FROM R JOIN S USING (a, b)) t >>>>>>>>>>>>>>>>> GROUP BY a; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Suppose we have the following plan tree in the MEMO, >> and >>>> let's >>>>>>> only >>>>>>>>>>>>>>>>> consider distribution for simplicity, each group >>>> represents a >>>>>>> RelSet >>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>> MEMO. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Group 1: Agg on [a] >>>>>>>>>>>>>>>>> Group 2: +-- MergeJoin on [a, b] >>>>>>>>>>>>>>>>> Group 3: |--- TableScan R >>>>>>>>>>>>>>>>> Group 4: +--- TableScan S >>>>>>>>>>>>>>>>> | Group No | Operator | Derived Traits | Required >>>> Traits | >>>>>>>>>>>>>>>>> | ----------- | ------------- | --------------- | >>>>>>> --------------- | >>>>>>>>>>>>>>>>> | Group 1 | Aggregate | Hash[a] | N/A >>>>>>> | >>>>>>>>>>>>>>>>> | Group 2 | MergeJoin | Hash[a,b] | Hash[a] >>>>>> | >>>>>>>>>>>>>>>>> | Group 3 | TableScan R | None | Hash[a,b] >>>> | >>>>>>>>>>>>>>>>> | Group 4 | TableScan S | None | Hash[a,b] >>>> | >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We will add new interface PhysicalNode (or extending >>>> RelNode) >>>>>>> with >>>>>>>>>>>>>>>>> methods: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - Pair<RelTraitSet,List<RelTraitSet>> >>>>>> requireTraits(RelTraitSet >>>>>>>>>>>>>>> required); >>>>>>>>>>>>>>>>> pair.left is the current operator's new traitset, >>>> pair.right >>>>>> is >>>>>>> the >>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>> of children's required traitset. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - RelNode passThrough(RelTraitSet required); >>>>>>>>>>>>>>>>> Default implementation will call above method >>>> requireTraits() >>>>>>> and >>>>>>>>>>>>>>>>> RelNode.copy() to create new RelNode. Available to be >>>>>> overriden >>>>>>> for >>>>>>>>>>>>>>>>> physical operators to customize the logic. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The planner will call above method on MergeJoin >> operator >>>> to >>>>>>> pass the >>>>>>>>>>>>>>>>> required traits (Hash[a]) to Mergejoin's child >> operators. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We will get a new MergeJoin: >>>>>>>>>>>>>>>>> MergeJoin hash[a] >>>>>>>>>>>>>>>>> |---- TableScan R hash[a] (RelSubset) >>>>>>>>>>>>>>>>> +---- TableScan S hash[a] (RelSubset) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Now the MEMO group looks like: >>>>>>>>>>>>>>>>> | Group No | Operator | Derived Traits | >> Required >>>>>>> Traits >>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>> | ---------- | -------- ----- | -------------------- | >>>>>>>>>>>>>>>>> --------------------- | >>>>>>>>>>>>>>>>> | Group1 | Aggregate | Hash[a] | >> N/A >>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>> | Group2 | MergeJoin | Hash[a,b], Hash[a]| Hash[a] >>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>> | Group3 | TableScan R | None | >>>>>> Hash[a,b], >>>>>>>>>>>>>>> Hash[a] >>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>> | Group4 | TableScan S | None | >>>>>> Hash[a,b], >>>>>>>>>>>>>>> Hash[a] >>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Calcite user may choose to ignore / not implement the >>>>>> interface >>>>>>> to keep >>>>>>>>>>>>>>>>> the original behavior. Each physical operator, >> according >>>> to >>>>>> its >>>>>>> own >>>>>>>>>>>>>>> logic, >>>>>>>>>>>>>>>>> decides whether passThrough() should pass traits down >> or >>>> not >>>>>> by >>>>>>>>>>>>>>> returning: >>>>>>>>>>>>>>>>> - a non-null RelNode, which means it can pass down >>>>>>>>>>>>>>>>> - null object, which means can't pass down >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2. Provide option to disable AbstractConverter >>>>>>>>>>>>>>>>> Once the plan can request traits in top-down way in >> the >>>>>>> framework, many >>>>>>>>>>>>>>>>> system don't need AbstractConverter anymore, since it >> is >>>> just >>>>>> a >>>>>>>>>>>>>>>>> intermediate operator to generate physical sort / >>>> exchange. >>>>>> For >>>>>>> those, >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>> can provide option to disable AbstractConverter, >> generate >>>>>>> physical >>>>>>>>>>>>>>> enforcer >>>>>>>>>>>>>>>>> directly by adding a method to interface Convention: >>>>>>>>>>>>>>>>> - RelNode enforce(RelNode node, RelTraitSet traits); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The default implementation may just calling >>>>>>>>>>>>>>> changeTraitsUsingConverters(), >>>>>>>>>>>>>>>>> but people may choose to override it if the system has >>>> special >>>>>>> needs, >>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>> several traits must implement together, or the >> position of >>>>>>> collation in >>>>>>>>>>>>>>>>> RelTraitSet is before distribution. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 3. Top-down driven, on-demand rule apply >>>>>>>>>>>>>>>>> For every RelNode in a RelSet, rule is matched and >> applied >>>>>>>>>>>>>>> sequentially, >>>>>>>>>>>>>>>>> newly generated RelNodes are added to the end of >> RelNode >>>> list >>>>>>> in the >>>>>>>>>>>>>>> RelSet >>>>>>>>>>>>>>>>> waiting for rule apply. RuleQueue and >> DeferringRuleCall >>>> is not >>>>>>> needed >>>>>>>>>>>>>>>>> anymore. This will make space pruning and rule mutual >>>>>>> exclusivity check >>>>>>>>>>>>>>>>> possible. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There are 3 stages for each RelSet: >>>>>>>>>>>>>>>>> a). Exploration: logical transformation, yields >> logical >>>> nodes >>>>>>>>>>>>>>>>> b). Implementation: physical transformation, yields >>>> physical >>>>>>> nodes >>>>>>>>>>>>>>>>> c). Optimization: trait request, enforcement >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The general process looks like: >>>>>>>>>>>>>>>>> - optimize RelSet X: >>>>>>>>>>>>>>>>> implement RelSet X >>>>>>>>>>>>>>>>> for each physical relnode in RelSet X: >>>>>>>>>>>>>>>>> // pass down trait requests to child RelSets >>>>>>>>>>>>>>>>> for each child RelSet Y of current relnode: >>>>>>>>>>>>>>>>> optimize RelSet Y >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - implement RelSet X: >>>>>>>>>>>>>>>>> if X has been implemented: >>>>>>>>>>>>>>>>> return >>>>>>>>>>>>>>>>> explore RelSet X >>>>>>>>>>>>>>>>> for each relnode in RelSet X: >>>>>>>>>>>>>>>>> apply physical rules >>>>>>>>>>>>>>>>> - explore RelSet X: >>>>>>>>>>>>>>>>> if X has been explored >>>>>>>>>>>>>>>>> return >>>>>>>>>>>>>>>>> for each relnode in RelSet X: >>>>>>>>>>>>>>>>> // ensure each child RelSet of current relnode is >>>>>> explored >>>>>>>>>>>>>>>>> for each child RelSet Y of current relnode: >>>>>>>>>>>>>>>>> explore RelSet Y >>>>>>>>>>>>>>>>> apply logical rules on current relnode >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Basically it is a state machine with several states: >>>>>>> Initialized, >>>>>>>>>>>>>>>>> Explored, Exploring, Implemented, Implementing, >> Optimized, >>>>>>> Optimizing >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> several transition methods: exploreRelSet, >> exploreRelNode, >>>>>>>>>>>>>>> implementRelSet, >>>>>>>>>>>>>>>>> implementRelNode, optimizeRelSet, optimizeRelNode... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To achieve this, we need to mark the rules either >> logical >>>> rule >>>>>>> or >>>>>>>>>>>>>>> physical >>>>>>>>>>>>>>>>> rule. >>>>>>>>>>>>>>>>> To keep backward compatibility, all the un-marked >> rules >>>> will >>>>>> be >>>>>>>>>>>>>>> treated as >>>>>>>>>>>>>>>>> logical rules, except rules that uses >> AbstractConverter as >>>>>> rule >>>>>>>>>>>>>>> operand, >>>>>>>>>>>>>>>>> these rules still need to applied top-down, or random >>>> order. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 4. On-demand, bottom-up trait derivation >>>>>>>>>>>>>>>>> It is called bottom-up, but actually driven by >> top-down, >>>>>>> happens same >>>>>>>>>>>>>>> time >>>>>>>>>>>>>>>>> as top-down trait request, in optimization stage >> mentioned >>>>>>> above. Many >>>>>>>>>>>>>>>>> Calcite based bigdata system only propagate traits on >>>> Project >>>>>>> and >>>>>>>>>>>>>>> Filter by >>>>>>>>>>>>>>>>> writing rules, which is very limited. In fact, we can >>>> extend >>>>>>> trait >>>>>>>>>>>>>>>>> propagation/derivation to all the operators, without >>>> rules, by >>>>>>> adding >>>>>>>>>>>>>>>>> interface PhysicalNode (or extending RelNode) with >> method: >>>>>>>>>>>>>>>>> - RelNode derive(RelTraitSet traits, int childId); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Given the following plan (only consider distribution >> for >>>>>>> simplicity): >>>>>>>>>>>>>>>>> Agg [a,b] >>>>>>>>>>>>>>>>> +-- MergeJoin [a] >>>>>>>>>>>>>>>>> |---- TableScan R >>>>>>>>>>>>>>>>> +--- TableScan S >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hash[a] won't satisfy Hash[a,b] without special >> treatment, >>>>>>> because >>>>>>>>>>>>>>> there >>>>>>>>>>>>>>>>> isn't a mechanism to coordinate traits between >> children. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Now we call derive method on Agg [a,b] node, >>>> derive(Hash[a], >>>>>>> 0), we get >>>>>>>>>>>>>>>>> the new node: >>>>>>>>>>>>>>>>> Agg [a] >>>>>>>>>>>>>>>>> +-- MergeJoin [a] (RelSubset) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We will provide different matching type, so each >> operator >>>> can >>>>>>> specify >>>>>>>>>>>>>>> what >>>>>>>>>>>>>>>>> kind of matching type it requires its children: >>>>>>>>>>>>>>>>> - MatchType getMatchType(RelTrait trait, int childId); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> a) Exact: Hash[a,b] exact match Hash[a,b], aka, >> satisfy >>>>>>>>>>>>>>>>> b) Partial: Hash[a] partial match Hash[a,b] >>>>>>>>>>>>>>>>> c) Permuted: Sort[a,b,c] permuted match Sort[c,b,a] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In addition, optimization order is provided for each >>>> opertor >>>>>> to >>>>>>>>>>>>>>> specify: >>>>>>>>>>>>>>>>> a) left to right >>>>>>>>>>>>>>>>> b) right to left >>>>>>>>>>>>>>>>> c) both >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For example, for query SELECT * FROM R join S on >> R.a=S.a >>>> and >>>>>>> R.b=S.b >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> R.c=S.c: >>>>>>>>>>>>>>>>> Suppose R is distributed by a,b, and S is distributed >> by >>>> c. >>>>>>>>>>>>>>>>> MergeJoin [a,b,c] >>>>>>>>>>>>>>>>> |--- TableScan R [a,b] >>>>>>>>>>>>>>>>> +-- TableScan S [c] >>>>>>>>>>>>>>>>> a) left to right, call derive(Hash[a,b], 0), we get >>>> MergeJoin >>>>>>> [a,b] >>>>>>>>>>>>>>>>> b) right to left, call derive(Hash[c], 1), we get >>>> MergeJoin >>>>>>> [c], most >>>>>>>>>>>>>>>>> likely a bad plan >>>>>>>>>>>>>>>>> c) both, get above 2 plans. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For system that doesn't have a fine-tuned stats and >> cost >>>>>> model, >>>>>>> it may >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>> be able to make a right decision based purely on cost. >>>>>> Probably >>>>>>> we >>>>>>>>>>>>>>> need to >>>>>>>>>>>>>>>>> provide the MergeJoin with both children's derived >>>> traitset >>>>>>> list. >>>>>>>>>>>>>>>>> - List<RelNode> derive(List<List<RelTraitSet>>); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Of course, all above methods are optional to >> implement for >>>>>>> those who >>>>>>>>>>>>>>>>> doesn't need this feature. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 5. Branch and Bound Space Pruning >>>>>>>>>>>>>>>>> After we implement on-demand, top-down trait >> enforcement >>>> and >>>>>>>>>>>>>>> rule-apply, >>>>>>>>>>>>>>>>> we can pass the cost limit at the time of passing down >>>>>> required >>>>>>>>>>>>>>> traits, as >>>>>>>>>>>>>>>>> described in the classical Cascades paper. Right now, >>>> Calcite >>>>>>> doesn't >>>>>>>>>>>>>>>>> provide group level logical properties, including >> stats >>>> info, >>>>>>> each >>>>>>>>>>>>>>> operator >>>>>>>>>>>>>>>>> in the same group has its own logical property and the >>>> stats >>>>>>> may vary, >>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>>> we can only do limited space pruning for trait >>>> enforcement, >>>>>>> still >>>>>>>>>>>>>>> good. But >>>>>>>>>>>>>>>>> if we agree to add option to share group level stats >>>> between >>>>>>> relnodes >>>>>>>>>>>>>>> in a >>>>>>>>>>>>>>>>> RelSet, we will be able to do more aggresive space >>>> pruning, >>>>>>> which will >>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>> boost the performance of join reorder planning. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> With all that being said, how do we move forward? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There are 2 ways: >>>>>>>>>>>>>>>>> a) Modify on current VolcanoPlanner. >>>>>>>>>>>>>>>>> Pros: code reuse, existing numerous test cases and >>>>>>> infrastructure, >>>>>>>>>>>>>>> fast >>>>>>>>>>>>>>>>> integration >>>>>>>>>>>>>>>>> Cons: changing code always brings risk >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> b) Add a new XXXXPlanner >>>>>>>>>>>>>>>>> Pros: no risk, no tech debt, no need to worry about >>>> backward >>>>>>>>>>>>>>>>> compatability >>>>>>>>>>>>>>>>> Cons: separate test cases for new planner, one more >>>> planner to >>>>>>>>>>>>>>> maintain >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We'd like to hear the community's thoughts and >> advices. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>> - Haisheng >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >