I mean all the rules should not apply on physical operators, like ProjectMergeRule etc, no matter on what phase. All the transformations should be applied on logical operators or AbstractConverter only, unless for some special needs, e.g. massage/hack the plan through rules after the final plan is generated.
And it is not generating less alternatives, it is avoiding generating duplicate alternatives. Volcano framework does seperate the logical exploration and physical implementation phases: "In the Volcano search strategy, a first phase applied all transformation rules to create all possible logical expressions for a query and all its subtrees. The second phase, which performed the actual optimization, navigated within that network of equivalence classes and expressions, applied implementation rules to obtain plans, and determined the best plan." [1], p21 Even they are mixed together, like Cascades framework, I don't see Calcite has big benefits over System-R with this regard, because there is no branch and bound search space pruning, which is hard to implement right and precise. Greenplum optimizer only does the branch and bound during physical implementation phase. [1] https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf - Haisheng ------------------------------------------------------------------ 发件人:Xiening Dai<[email protected]> 日 期:2019年10月29日 06:13:13 收件人:<[email protected]> 主 题:Re: Same rules fired for logical and physical nodes I open a PR - https://github.com/apache/calcite/pull/1543 > On Oct 28, 2019, at 3:00 PM, Xiening Dai <[email protected]> wrote: > > Thanks for you input. > > I think this doesn’t relates to the Volcano theory. I don’t try to separate > the logical transformation from physical implementation. What I propose is > not to fire the same transformation rule on physical nodes as those > transformation are already done on the logic nodes. > > I tested out my patch, and all UTs passed. I can open a PR for that. > > >> On Oct 28, 2019, at 2:18 PM, Stamatis Zampetakis <[email protected]> wrote: >> >> Hello, >> >> It is not surprising that by reducing the scope of the rules the >> planning times goes down since we are generating less alternatives. >> >> Mixing logical with physical optimizations is costly but it is one of the >> big benefits offered by the Volcano framework in contrast with the System-R >> style optimization where logical and physical transformations are >> completely separated. >> >> Indeed in some cases having the same rule fire for both matching logical >> and physical nodes does not make sense so we could apply some tunings. >> >> @Xiening: Have you tried running all tests with the modifications you >> mentioned? Aren't we missing any good plans? >> @Haisheng: What do you mean by saying that rules should match logical >> operators only? >> >> Best, >> Stamatis >> >> On Mon, Oct 28, 2019 at 8:54 PM Haisheng Yuan <[email protected]> >> wrote: >> >>> Agree, it is indeed redundant. We had a discussion about this in pull >>> request #1130 [1]. >>> >>> Many of these rules not only match physical operaors, but still generate >>> new logical operators. >>> >>> IMHO, rules should match logical operators only. >>> >>> [1] https://github.com/apache/calcite/pull/1130 >>> >>> On 2019/10/28 16:47:55, Xiening Dai wrote: >>>> Hi all, >>>> >>>> While I was looking at CALCITE-2970, I noticed that some of the rules >>> are fired for both logical and physical nodes. For example, >>> ProjectMergeRule matches Project.class, so it’s fired for LogicalProject. >>> But then after LogicalProject is converted into EnummerableProject, the >>> same rule is fired again for the physical rels. Same for >>> EnumerableLimitRule, SortRemoveConstantKeysRule, etc. >>>> >>>> This seems to be unnecessary. When ProjectMerge is applied to >>> LogicalProject nodes, we already generate all possible alternatives with >>> merged projects. We just need to convert the LogicalProject into >>> EnumerableProject. There’s no need to merge EnumerableProject again. >>>> >>>> If I update those rules to only match logical nodes, the planning time >>> of the case in CALCITE-2970 is reduced ~30%. >>>> >>>> Any thoughts? >>> >>> >
