Igor, That's great.
On 2020/04/20 11:17:49, Seliverstov Igor <gvvinbl...@gmail.com> wrote: > Haisheng, Xiening, > > Ok, Now I see how it should work. > > Thanks for your replies. > > Regards, > Igor > > > 20 апр. 2020 г., в 09:56, Seliverstov Igor <gvvinbl...@gmail.com> > > написал(а): > > > > Haisheng, Xiening, > > > > Thanks for clarifying. > > > > In this proposal, we are not trying to split logical and physical planning > > entirely. - actually I was in doubt about an idea of entire splitting > > logical and physical phases, if you aren't going to, I have no objections. > > > > But it returns me to my first question: how we will propagate traits in > > bottom-up manner using proposed approach. (See [DISCUSS] Proposal to add > > API to force rules matching specific rels for details) > > > > One of inconveniences of current VolcanoPlanner implementation is amount of > > tricks that we need to get desired behaviour. It would be great if some of > > issues (or all of them) were solved in the new approach. > > > > Regards, > > Igor > > > > пн, 20 апр. 2020 г., 7:02 Xiening Dai <xndai....@gmail.com > > <mailto:xndai....@gmail.com>>: > > Hi Igor, > > > > Your comment - "because actual cost may be calculated correctly using > > physical operators only. So won't be able to implement Branch and Bound > > Space Pruning.“ is actually not true. In Cascade’s lower bound / upper > > bound pruning algorithm, you can get cost lower bound of input RelNode > > using cardinality * unit_copy_cost. The unit_copy_cost is a constant, which > > stands for the minimal cost for processing a tuple from the input. So this > > will be the minimal cost the input RelNode can achieve, and if this is > > indeed larger than the current best cost, this input node can be pruned. > > > > In this proposal, we are not trying to split logical and physical planning > > entirely. But for any given equivalent set, we would need to finish > > exploration before implementation. But for the entire memo tree, each set > > could be in different planning stage, and an equivalent set can be pruned > > even before it’s implemented. A text book example of aforementioned “lower > > bound / upper bound pruning” would be the join ordering case. > > > > Regarding #3, I think we can still achieve that (partially) through this > > proposal. Remember every time when we start the optimization, we pass down > > an upper bound limit. Initially this upper bound for root is infinite - > > meaning that no feasible plan is available. Every time when we find a > > physical plan we update this upper bound, then start the search again. We > > could stop the search when the cost is less than a pre-defined threshold - > > which gives you a “good enough” plan with early termination. Still this > > wouldn't avoid the logical exploration. For that, you would probably > > archive through rule configurations, and avoid some expansive > > transformation to keep the cost down. > > > > > > > On Apr 19, 2020, at 7:30 PM, Haisheng Yuan <hy...@apache.org > > > <mailto:hy...@apache.org>> wrote: > > > > > > Igor, > > > > > > a) Given current Calcite's stats derivation strategy, mixing logical and > > > physical planning won't make it better. I hope you went through my email > > > to the end, currently operators inside a memo group don't share stats > > > info, each operator's stats may differ with the other one, and the > > > RelSubset's best node and stats may vary during optimization. So logical > > > and physical rule pruning are not safe at the moment, otherwise it almost > > > implies changing downstream project's cost model silently. > > > > > > On the other hand, ensuring child group exploration task finish first > > > will make rule mutual exclusivity check possible, like the result of > > > ReduceExpressionRule won't need trigger the same rule again, The join > > > result of JoinCommuteRule won't need trigger JoinCommuteRule and > > > ReduceExpressionRule again. > > > > > > More importantly, most if not all the the long planning queries in > > > Calcite are not caused by too many alternatives, but mutual triggering, > > > or cyclic triggering, which can be avoided by customizing the rules > > > instead of using the default one. Unless you use dynamic programming > > > (Calcite use heuristic) on join reordering (left-deep, right-deep, > > > bushy), space pruning won't help make long / infinite running query > > > faster. > > > > > > b) No evidence shows current version of Calcite will return the most > > > promising plan in first planning iteration. Instead of praying for > > > getting good enough plan in the first iteration, why not focus on fixing > > > rules that causes the issue? > > > > > > c) That is not the goal. > > > > > > On 2020/04/19 15:14:57, Seliverstov Igor <gvvinbl...@gmail.com > > > <mailto:gvvinbl...@gmail.com>> wrote: > > >> Haisheng, > > >> > > >> From my point of view splitting logical and physical planning parts > > >> isn’t a good idea. > > >> > > >> I think so because actual cost may be calculated correctly using > > >> physical operators only. So that we: > > >> a) won't be able to implement Branch and Bound Space Pruning (as far as > > >> I understand, at exploring time there are no physical operators, no > > >> correct costs, but assumptions only, I don’t think we should rely on > > >> them) > > >> b) won’t be able to get good enough plan (without Branch and Bound Space > > >> Pruning it’s non-trivial task to get right search direction and most > > >> promising plans in first planning iterations) > > >> c) won’t be able to tune the good enough plan during several similar > > >> queries are executed > > >> > > >> Regards, > > >> Igor > > >> > > >> > > >>> 19 апр. 2020 г., в 17:37, Haisheng Yuan <hy...@apache.org > > >>> <mailto:hy...@apache.org>> написал(а): > > >>> > > >>> Hi Igor, > > >>> > > >>> You can't have your cake and eat it. > > >>> But one feasible work item definitely we can do is that once timeout, > > >>> stop exploring, use the first available physical operator in each group > > >>> and optimize it. > > >>> > > >>> Because most, if not all, of the long / infinite running optimizations > > >>> are caused by project related transpose, join reordering (join commute > > >>> + associative), constant reduction for large expression (see > > >>> CALCITE-3460), all of which are logical transformations rules and many > > >>> of which have corresponding JIRAs. So another alternative is, let's fix > > >>> these bugs to improve Calcite to make timeout option less usable. > > >>> > > >>> Another thing worth noting is that sql server optimizer timeout > > >>> mechanism is not based on clock time, but the number of optimization > > >>> tasks it has done [1]. > > >>> > > >>> [1] > > >>> https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188 > > >>> > > >>> <https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188> > > >>> > > >>> Regards, > > >>> Haisheng > > >>> > > >>> On 2020/04/19 11:31:27, Seliverstov Igor <gvvinbl...@gmail.com > > >>> <mailto:gvvinbl...@gmail.com>> wrote: > > >>>> Haisheng, > > >>>> > > >>>> Ok, then such notification isn't needed > > >>>> > > >>>> But in this case we don't have any control over how long planning takes > > >>>> > > >>>> For some systems it's necessary to get good enough plan right now > > >>>> instead > > >>>> of best one after while > > >>>> > > >>>> For example we've been considering a case when a query is optimised > > >>>> several > > >>>> times in short iterations in case it's impossible to get the best plan > > >>>> in > > >>>> reasonable period of time (for example there is an SLA for response > > >>>> time) > > >>>> > > >>>> This mean we need all needed physical implementations after each > > >>>> logical > > >>>> transformation is applied. > > >>>> > > >>>> Regards, > > >>>> Igor > > >>>> > > >>>> > > >>>> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan <hy...@apache.org > > >>>> <mailto:hy...@apache.org>>: > > >>>> > > >>>>> Hi Igor, > > >>>>> > > >>>>> There will be no rule queue anymore. Y will be fully explored (logical > > >>>>> rules are matched and applied) before it can be implemented and > > >>>>> optimized. > > >>>>> > > >>>>> Thanks, > > >>>>> Haisheng > > >>>>> > > >>>>> On 2020/04/19 10:11:45, Seliverstov Igor <gvvinbl...@gmail.com > > >>>>> <mailto:gvvinbl...@gmail.com>> wrote: > > >>>>>> Hi Haisheng, > > >>>>>> > > >>>>>> Great explanation, thanks. > > >>>>>> > > >>>>>> One thing I'd like to cover in advance is trait propagation process > > >>>>>> (I > > >>>>> need > > >>>>>> it for Apache Ignite SQL engine implementation). > > >>>>>> > > >>>>>> For example: > > >>>>>> > > >>>>>> There are two nodes: Rel X and its child node Rel Y > > >>>>>> > > >>>>>> Both nodes are in Optimized state, and there is a Logical rule for > > >>>>>> Rel Y > > >>>>> in > > >>>>>> a rules queue matched, > > >>>>>> > > >>>>>> After logical rule is applied, there is a logical rel Rel Y' and > > >>>>> containing > > >>>>>> it set moves into Exploring state (since there is a logical node > > >>>>>> which > > >>>>> has > > >>>>>> to be implemented) > > >>>>>> > > >>>>>> After whole process Exploring -> ... -> Optimized we need to force > > >>>>>> parent > > >>>>>> Rel X set to switch its state from Optimized to Optimizing to peek > > >>>>>> the > > >>>>>> newly implemented child and (possible) produce a new Rel X' physical > > >>>>>> rel > > >>>>> on > > >>>>>> the basis of previously requested from the Rel X set traits. > > >>>>>> > > >>>>>> If a new Rel X' is produced, a Rel X' parent should move its state > > >>>>>> Optimized -> Optimizing and repeat described above operations. > > >>>>>> > > >>>>>> Does it look like true? > > >>>>>> > > >>>>>> Regards, > > >>>>>> Igor > > >>>>>> > > >>>>>> > > >>>>>> вс, 19 апр. 2020 г., 6:52 Haisheng Yuan <h.y...@alibaba-inc.com > > >>>>>> <mailto:h.y...@alibaba-inc.com>>: > > >>>>>> > > >>>>>>> Hi, > > >>>>>>> > > >>>>>>> In the past few months, we have discussed a lot about Cascades style > > >>>>>>> top-down optimization, including on-demand trait derivation/request, > > >>>>> rule > > >>>>>>> apply, branch and bound space pruning. Now we think it is time to > > >>>>>>> move > > >>>>>>> towards these targets. > > >>>>>>> > > >>>>>>> We will separate it into several small issues, and each one can be > > >>>>>>> integrated as a standalone, independent feature, and most > > >>>>>>> importantly, > > >>>>>>> meanwhile keep backward compatibility. > > >>>>>>> > > >>>>>>> 1. Top-down trait request > > >>>>>>> In other words, pass traits requirements from parent nodes to child > > >>>>> nodes. > > >>>>>>> The trait requests happens after all the logical transformation > > >>>>>>> rules > > >>>>> and > > >>>>>>> physical implementation rules are done, in a top-down manner, driven > > >>>>> from > > >>>>>>> root set. e.g.: > > >>>>>>> SELECT a, sum(c) FROM > > >>>>>>> (SELECT * FROM R JOIN S USING (a, b)) t > > >>>>>>> GROUP BY a; > > >>>>>>> > > >>>>>>> Suppose we have the following plan tree in the MEMO, and let's only > > >>>>>>> consider distribution for simplicity, each group represents a RelSet > > >>>>> in the > > >>>>>>> MEMO. > > >>>>>>> > > >>>>>>> Group 1: Agg on [a] > > >>>>>>> Group 2: +-- MergeJoin on [a, b] > > >>>>>>> Group 3: |--- TableScan R > > >>>>>>> Group 4: +--- TableScan S > > >>>>>>> | Group No | Operator | Derived Traits | Required Traits | > > >>>>>>> | ----------- | ------------- | --------------- | --------------- | > > >>>>>>> | Group 1 | Aggregate | Hash[a] | N/A | > > >>>>>>> | Group 2 | MergeJoin | Hash[a,b] | Hash[a] | > > >>>>>>> | Group 3 | TableScan R | None | Hash[a,b] | > > >>>>>>> | Group 4 | TableScan S | None | Hash[a,b] | > > >>>>>>> > > >>>>>>> We will add new interface PhysicalNode (or extending RelNode) with > > >>>>>>> methods: > > >>>>>>> > > >>>>>>> - Pair<RelTraitSet,List<RelTraitSet>> requireTraits(RelTraitSet > > >>>>> required); > > >>>>>>> pair.left is the current operator's new traitset, pair.right is the > > >>>>> list > > >>>>>>> of children's required traitset. > > >>>>>>> > > >>>>>>> - RelNode passThrough(RelTraitSet required); > > >>>>>>> Default implementation will call above method requireTraits() and > > >>>>>>> RelNode.copy() to create new RelNode. Available to be overriden for > > >>>>>>> physical operators to customize the logic. > > >>>>>>> > > >>>>>>> The planner will call above method on MergeJoin operator to pass the > > >>>>>>> required traits (Hash[a]) to Mergejoin's child operators. > > >>>>>>> > > >>>>>>> We will get a new MergeJoin: > > >>>>>>> MergeJoin hash[a] > > >>>>>>> |---- TableScan R hash[a] (RelSubset) > > >>>>>>> +---- TableScan S hash[a] (RelSubset) > > >>>>>>> > > >>>>>>> Now the MEMO group looks like: > > >>>>>>> | Group No | Operator | Derived Traits | Required Traits > > >>>>> | > > >>>>>>> | ---------- | -------- ----- | -------------------- | > > >>>>>>> --------------------- | > > >>>>>>> | Group1 | Aggregate | Hash[a] | N/A > > >>>>>>> | > > >>>>>>> | Group2 | MergeJoin | Hash[a,b], Hash[a]| Hash[a] > > >>>>>>> | > > >>>>>>> | Group3 | TableScan R | None | Hash[a,b], > > >>>>> Hash[a] > > >>>>>>> | > > >>>>>>> | Group4 | TableScan S | None | Hash[a,b], > > >>>>> Hash[a] > > >>>>>>> | > > >>>>>>> > > >>>>>>> Calcite user may choose to ignore / not implement the interface to > > >>>>>>> keep > > >>>>>>> the original behavior. Each physical operator, according to its own > > >>>>> logic, > > >>>>>>> decides whether passThrough() should pass traits down or not by > > >>>>> returning: > > >>>>>>> - a non-null RelNode, which means it can pass down > > >>>>>>> - null object, which means can't pass down > > >>>>>>> > > >>>>>>> 2. Provide option to disable AbstractConverter > > >>>>>>> Once the plan can request traits in top-down way in the framework, > > >>>>>>> many > > >>>>>>> system don't need AbstractConverter anymore, since it is just a > > >>>>>>> intermediate operator to generate physical sort / exchange. For > > >>>>>>> those, > > >>>>> we > > >>>>>>> can provide option to disable AbstractConverter, generate physical > > >>>>> enforcer > > >>>>>>> directly by adding a method to interface Convention: > > >>>>>>> - RelNode enforce(RelNode node, RelTraitSet traits); > > >>>>>>> > > >>>>>>> The default implementation may just calling > > >>>>> changeTraitsUsingConverters(), > > >>>>>>> but people may choose to override it if the system has special > > >>>>>>> needs, > > >>>>> like > > >>>>>>> several traits must implement together, or the position of > > >>>>>>> collation in > > >>>>>>> RelTraitSet is before distribution. > > >>>>>>> > > >>>>>>> 3. Top-down driven, on-demand rule apply > > >>>>>>> For every RelNode in a RelSet, rule is matched and applied > > >>>>> sequentially, > > >>>>>>> newly generated RelNodes are added to the end of RelNode list in the > > >>>>> RelSet > > >>>>>>> waiting for rule apply. RuleQueue and DeferringRuleCall is not > > >>>>>>> needed > > >>>>>>> anymore. This will make space pruning and rule mutual exclusivity > > >>>>>>> check > > >>>>>>> possible. > > >>>>>>> > > >>>>>>> There are 3 stages for each RelSet: > > >>>>>>> a). Exploration: logical transformation, yields logical nodes > > >>>>>>> b). Implementation: physical transformation, yields physical nodes > > >>>>>>> c). Optimization: trait request, enforcement > > >>>>>>> > > >>>>>>> The general process looks like: > > >>>>>>> - optimize RelSet X: > > >>>>>>> implement RelSet X > > >>>>>>> for each physical relnode in RelSet X: > > >>>>>>> // pass down trait requests to child RelSets > > >>>>>>> for each child RelSet Y of current relnode: > > >>>>>>> optimize RelSet Y > > >>>>>>> > > >>>>>>> - implement RelSet X: > > >>>>>>> if X has been implemented: > > >>>>>>> return > > >>>>>>> explore RelSet X > > >>>>>>> for each relnode in RelSet X: > > >>>>>>> apply physical rules > > >>>>>>> - explore RelSet X: > > >>>>>>> if X has been explored > > >>>>>>> return > > >>>>>>> for each relnode in RelSet X: > > >>>>>>> // ensure each child RelSet of current relnode is explored > > >>>>>>> for each child RelSet Y of current relnode: > > >>>>>>> explore RelSet Y > > >>>>>>> apply logical rules on current relnode > > >>>>>>> > > >>>>>>> Basically it is a state machine with several states: Initialized, > > >>>>>>> Explored, Exploring, Implemented, Implementing, Optimized, > > >>>>>>> Optimizing > > >>>>> and > > >>>>>>> several transition methods: exploreRelSet, exploreRelNode, > > >>>>> implementRelSet, > > >>>>>>> implementRelNode, optimizeRelSet, optimizeRelNode... > > >>>>>>> > > >>>>>>> To achieve this, we need to mark the rules either logical rule or > > >>>>> physical > > >>>>>>> rule. > > >>>>>>> To keep backward compatibility, all the un-marked rules will be > > >>>>> treated as > > >>>>>>> logical rules, except rules that uses AbstractConverter as rule > > >>>>> operand, > > >>>>>>> these rules still need to applied top-down, or random order. > > >>>>>>> > > >>>>>>> > > >>>>>>> 4. On-demand, bottom-up trait derivation > > >>>>>>> It is called bottom-up, but actually driven by top-down, happens > > >>>>>>> same > > >>>>> time > > >>>>>>> as top-down trait request, in optimization stage mentioned above. > > >>>>>>> Many > > >>>>>>> Calcite based bigdata system only propagate traits on Project and > > >>>>> Filter by > > >>>>>>> writing rules, which is very limited. In fact, we can extend trait > > >>>>>>> propagation/derivation to all the operators, without rules, by > > >>>>>>> adding > > >>>>>>> interface PhysicalNode (or extending RelNode) with method: > > >>>>>>> - RelNode derive(RelTraitSet traits, int childId); > > >>>>>>> > > >>>>>>> Given the following plan (only consider distribution for > > >>>>>>> simplicity): > > >>>>>>> Agg [a,b] > > >>>>>>> +-- MergeJoin [a] > > >>>>>>> |---- TableScan R > > >>>>>>> +--- TableScan S > > >>>>>>> > > >>>>>>> Hash[a] won't satisfy Hash[a,b] without special treatment, because > > >>>>> there > > >>>>>>> isn't a mechanism to coordinate traits between children. > > >>>>>>> > > >>>>>>> Now we call derive method on Agg [a,b] node, derive(Hash[a], 0), we > > >>>>>>> get > > >>>>>>> the new node: > > >>>>>>> Agg [a] > > >>>>>>> +-- MergeJoin [a] (RelSubset) > > >>>>>>> > > >>>>>>> We will provide different matching type, so each operator can > > >>>>>>> specify > > >>>>> what > > >>>>>>> kind of matching type it requires its children: > > >>>>>>> - MatchType getMatchType(RelTrait trait, int childId); > > >>>>>>> > > >>>>>>> a) Exact: Hash[a,b] exact match Hash[a,b], aka, satisfy > > >>>>>>> b) Partial: Hash[a] partial match Hash[a,b] > > >>>>>>> c) Permuted: Sort[a,b,c] permuted match Sort[c,b,a] > > >>>>>>> > > >>>>>>> In addition, optimization order is provided for each opertor to > > >>>>> specify: > > >>>>>>> a) left to right > > >>>>>>> b) right to left > > >>>>>>> c) both > > >>>>>>> > > >>>>>>> For example, for query SELECT * FROM R join S on R.a=S.a and R.b=S.b > > >>>>> and > > >>>>>>> R.c=S.c: > > >>>>>>> Suppose R is distributed by a,b, and S is distributed by c. > > >>>>>>> MergeJoin [a,b,c] > > >>>>>>> |--- TableScan R [a,b] > > >>>>>>> +-- TableScan S [c] > > >>>>>>> a) left to right, call derive(Hash[a,b], 0), we get MergeJoin [a,b] > > >>>>>>> b) right to left, call derive(Hash[c], 1), we get MergeJoin [c], > > >>>>>>> most > > >>>>>>> likely a bad plan > > >>>>>>> c) both, get above 2 plans. > > >>>>>>> > > >>>>>>> For system that doesn't have a fine-tuned stats and cost model, it > > >>>>>>> may > > >>>>> not > > >>>>>>> be able to make a right decision based purely on cost. Probably we > > >>>>> need to > > >>>>>>> provide the MergeJoin with both children's derived traitset list. > > >>>>>>> - List<RelNode> derive(List<List<RelTraitSet>>); > > >>>>>>> > > >>>>>>> Of course, all above methods are optional to implement for those who > > >>>>>>> doesn't need this feature. > > >>>>>>> > > >>>>>>> 5. Branch and Bound Space Pruning > > >>>>>>> After we implement on-demand, top-down trait enforcement and > > >>>>> rule-apply, > > >>>>>>> we can pass the cost limit at the time of passing down required > > >>>>> traits, as > > >>>>>>> described in the classical Cascades paper. Right now, Calcite > > >>>>>>> doesn't > > >>>>>>> provide group level logical properties, including stats info, each > > >>>>> operator > > >>>>>>> in the same group has its own logical property and the stats may > > >>>>>>> vary, > > >>>>> so > > >>>>>>> we can only do limited space pruning for trait enforcement, still > > >>>>> good. But > > >>>>>>> if we agree to add option to share group level stats between > > >>>>>>> relnodes > > >>>>> in a > > >>>>>>> RelSet, we will be able to do more aggresive space pruning, which > > >>>>>>> will > > >>>>> help > > >>>>>>> boost the performance of join reorder planning. > > >>>>>>> > > >>>>>>> > > >>>>>>> With all that being said, how do we move forward? > > >>>>>>> > > >>>>>>> There are 2 ways: > > >>>>>>> a) Modify on current VolcanoPlanner. > > >>>>>>> Pros: code reuse, existing numerous test cases and infrastructure, > > >>>>> fast > > >>>>>>> integration > > >>>>>>> Cons: changing code always brings risk > > >>>>>>> > > >>>>>>> b) Add a new XXXXPlanner > > >>>>>>> Pros: no risk, no tech debt, no need to worry about backward > > >>>>>>> compatability > > >>>>>>> Cons: separate test cases for new planner, one more planner to > > >>>>> maintain > > >>>>>>> > > >>>>>>> We'd like to hear the community's thoughts and advices. > > >>>>>>> > > >>>>>>> Thanks. > > >>>>>>> - Haisheng > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >> > > >> > > > >