Haisheng, Xiening, Ok, Now I see how it should work.
Thanks for your replies. Regards, Igor > 20 апр. 2020 г., в 09:56, Seliverstov Igor <gvvinbl...@gmail.com> написал(а): > > Haisheng, Xiening, > > Thanks for clarifying. > > In this proposal, we are not trying to split logical and physical planning > entirely. - actually I was in doubt about an idea of entire splitting logical > and physical phases, if you aren't going to, I have no objections. > > But it returns me to my first question: how we will propagate traits in > bottom-up manner using proposed approach. (See [DISCUSS] Proposal to add API > to force rules matching specific rels for details) > > One of inconveniences of current VolcanoPlanner implementation is amount of > tricks that we need to get desired behaviour. It would be great if some of > issues (or all of them) were solved in the new approach. > > Regards, > Igor > > пн, 20 апр. 2020 г., 7:02 Xiening Dai <xndai....@gmail.com > <mailto:xndai....@gmail.com>>: > Hi Igor, > > Your comment - "because actual cost may be calculated correctly using > physical operators only. So won't be able to implement Branch and Bound Space > Pruning.“ is actually not true. In Cascade’s lower bound / upper bound > pruning algorithm, you can get cost lower bound of input RelNode using > cardinality * unit_copy_cost. The unit_copy_cost is a constant, which stands > for the minimal cost for processing a tuple from the input. So this will be > the minimal cost the input RelNode can achieve, and if this is indeed larger > than the current best cost, this input node can be pruned. > > In this proposal, we are not trying to split logical and physical planning > entirely. But for any given equivalent set, we would need to finish > exploration before implementation. But for the entire memo tree, each set > could be in different planning stage, and an equivalent set can be pruned > even before it’s implemented. A text book example of aforementioned “lower > bound / upper bound pruning” would be the join ordering case. > > Regarding #3, I think we can still achieve that (partially) through this > proposal. Remember every time when we start the optimization, we pass down an > upper bound limit. Initially this upper bound for root is infinite - meaning > that no feasible plan is available. Every time when we find a physical plan > we update this upper bound, then start the search again. We could stop the > search when the cost is less than a pre-defined threshold - which gives you a > “good enough” plan with early termination. Still this wouldn't avoid the > logical exploration. For that, you would probably archive through rule > configurations, and avoid some expansive transformation to keep the cost > down. > > > > On Apr 19, 2020, at 7:30 PM, Haisheng Yuan <hy...@apache.org > > <mailto:hy...@apache.org>> wrote: > > > > Igor, > > > > a) Given current Calcite's stats derivation strategy, mixing logical and > > physical planning won't make it better. I hope you went through my email to > > the end, currently operators inside a memo group don't share stats info, > > each operator's stats may differ with the other one, and the RelSubset's > > best node and stats may vary during optimization. So logical and physical > > rule pruning are not safe at the moment, otherwise it almost implies > > changing downstream project's cost model silently. > > > > On the other hand, ensuring child group exploration task finish first will > > make rule mutual exclusivity check possible, like the result of > > ReduceExpressionRule won't need trigger the same rule again, The join > > result of JoinCommuteRule won't need trigger JoinCommuteRule and > > ReduceExpressionRule again. > > > > More importantly, most if not all the the long planning queries in Calcite > > are not caused by too many alternatives, but mutual triggering, or cyclic > > triggering, which can be avoided by customizing the rules instead of using > > the default one. Unless you use dynamic programming (Calcite use heuristic) > > on join reordering (left-deep, right-deep, bushy), space pruning won't help > > make long / infinite running query faster. > > > > b) No evidence shows current version of Calcite will return the most > > promising plan in first planning iteration. Instead of praying for getting > > good enough plan in the first iteration, why not focus on fixing rules that > > causes the issue? > > > > c) That is not the goal. > > > > On 2020/04/19 15:14:57, Seliverstov Igor <gvvinbl...@gmail.com > > <mailto:gvvinbl...@gmail.com>> wrote: > >> Haisheng, > >> > >> From my point of view splitting logical and physical planning parts isn’t > >> a good idea. > >> > >> I think so because actual cost may be calculated correctly using physical > >> operators only. So that we: > >> a) won't be able to implement Branch and Bound Space Pruning (as far as I > >> understand, at exploring time there are no physical operators, no correct > >> costs, but assumptions only, I don’t think we should rely on them) > >> b) won’t be able to get good enough plan (without Branch and Bound Space > >> Pruning it’s non-trivial task to get right search direction and most > >> promising plans in first planning iterations) > >> c) won’t be able to tune the good enough plan during several similar > >> queries are executed > >> > >> Regards, > >> Igor > >> > >> > >>> 19 апр. 2020 г., в 17:37, Haisheng Yuan <hy...@apache.org > >>> <mailto:hy...@apache.org>> написал(а): > >>> > >>> Hi Igor, > >>> > >>> You can't have your cake and eat it. > >>> But one feasible work item definitely we can do is that once timeout, > >>> stop exploring, use the first available physical operator in each group > >>> and optimize it. > >>> > >>> Because most, if not all, of the long / infinite running optimizations > >>> are caused by project related transpose, join reordering (join commute + > >>> associative), constant reduction for large expression (see CALCITE-3460), > >>> all of which are logical transformations rules and many of which have > >>> corresponding JIRAs. So another alternative is, let's fix these bugs to > >>> improve Calcite to make timeout option less usable. > >>> > >>> Another thing worth noting is that sql server optimizer timeout mechanism > >>> is not based on clock time, but the number of optimization tasks it has > >>> done [1]. > >>> > >>> [1] > >>> https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188 > >>> > >>> <https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188> > >>> > >>> Regards, > >>> Haisheng > >>> > >>> On 2020/04/19 11:31:27, Seliverstov Igor <gvvinbl...@gmail.com > >>> <mailto:gvvinbl...@gmail.com>> wrote: > >>>> Haisheng, > >>>> > >>>> Ok, then such notification isn't needed > >>>> > >>>> But in this case we don't have any control over how long planning takes > >>>> > >>>> For some systems it's necessary to get good enough plan right now instead > >>>> of best one after while > >>>> > >>>> For example we've been considering a case when a query is optimised > >>>> several > >>>> times in short iterations in case it's impossible to get the best plan in > >>>> reasonable period of time (for example there is an SLA for response time) > >>>> > >>>> This mean we need all needed physical implementations after each logical > >>>> transformation is applied. > >>>> > >>>> Regards, > >>>> Igor > >>>> > >>>> > >>>> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan <hy...@apache.org > >>>> <mailto:hy...@apache.org>>: > >>>> > >>>>> Hi Igor, > >>>>> > >>>>> There will be no rule queue anymore. Y will be fully explored (logical > >>>>> rules are matched and applied) before it can be implemented and > >>>>> optimized. > >>>>> > >>>>> Thanks, > >>>>> Haisheng > >>>>> > >>>>> On 2020/04/19 10:11:45, Seliverstov Igor <gvvinbl...@gmail.com > >>>>> <mailto:gvvinbl...@gmail.com>> wrote: > >>>>>> Hi Haisheng, > >>>>>> > >>>>>> Great explanation, thanks. > >>>>>> > >>>>>> One thing I'd like to cover in advance is trait propagation process (I > >>>>> need > >>>>>> it for Apache Ignite SQL engine implementation). > >>>>>> > >>>>>> For example: > >>>>>> > >>>>>> There are two nodes: Rel X and its child node Rel Y > >>>>>> > >>>>>> Both nodes are in Optimized state, and there is a Logical rule for Rel > >>>>>> Y > >>>>> in > >>>>>> a rules queue matched, > >>>>>> > >>>>>> After logical rule is applied, there is a logical rel Rel Y' and > >>>>> containing > >>>>>> it set moves into Exploring state (since there is a logical node which > >>>>> has > >>>>>> to be implemented) > >>>>>> > >>>>>> After whole process Exploring -> ... -> Optimized we need to force > >>>>>> parent > >>>>>> Rel X set to switch its state from Optimized to Optimizing to peek the > >>>>>> newly implemented child and (possible) produce a new Rel X' physical > >>>>>> rel > >>>>> on > >>>>>> the basis of previously requested from the Rel X set traits. > >>>>>> > >>>>>> If a new Rel X' is produced, a Rel X' parent should move its state > >>>>>> Optimized -> Optimizing and repeat described above operations. > >>>>>> > >>>>>> Does it look like true? > >>>>>> > >>>>>> Regards, > >>>>>> Igor > >>>>>> > >>>>>> > >>>>>> вс, 19 апр. 2020 г., 6:52 Haisheng Yuan <h.y...@alibaba-inc.com > >>>>>> <mailto:h.y...@alibaba-inc.com>>: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> In the past few months, we have discussed a lot about Cascades style > >>>>>>> top-down optimization, including on-demand trait derivation/request, > >>>>> rule > >>>>>>> apply, branch and bound space pruning. Now we think it is time to move > >>>>>>> towards these targets. > >>>>>>> > >>>>>>> We will separate it into several small issues, and each one can be > >>>>>>> integrated as a standalone, independent feature, and most importantly, > >>>>>>> meanwhile keep backward compatibility. > >>>>>>> > >>>>>>> 1. Top-down trait request > >>>>>>> In other words, pass traits requirements from parent nodes to child > >>>>> nodes. > >>>>>>> The trait requests happens after all the logical transformation rules > >>>>> and > >>>>>>> physical implementation rules are done, in a top-down manner, driven > >>>>> from > >>>>>>> root set. e.g.: > >>>>>>> SELECT a, sum(c) FROM > >>>>>>> (SELECT * FROM R JOIN S USING (a, b)) t > >>>>>>> GROUP BY a; > >>>>>>> > >>>>>>> Suppose we have the following plan tree in the MEMO, and let's only > >>>>>>> consider distribution for simplicity, each group represents a RelSet > >>>>> in the > >>>>>>> MEMO. > >>>>>>> > >>>>>>> Group 1: Agg on [a] > >>>>>>> Group 2: +-- MergeJoin on [a, b] > >>>>>>> Group 3: |--- TableScan R > >>>>>>> Group 4: +--- TableScan S > >>>>>>> | Group No | Operator | Derived Traits | Required Traits | > >>>>>>> | ----------- | ------------- | --------------- | --------------- | > >>>>>>> | Group 1 | Aggregate | Hash[a] | N/A | > >>>>>>> | Group 2 | MergeJoin | Hash[a,b] | Hash[a] | > >>>>>>> | Group 3 | TableScan R | None | Hash[a,b] | > >>>>>>> | Group 4 | TableScan S | None | Hash[a,b] | > >>>>>>> > >>>>>>> We will add new interface PhysicalNode (or extending RelNode) with > >>>>>>> methods: > >>>>>>> > >>>>>>> - Pair<RelTraitSet,List<RelTraitSet>> requireTraits(RelTraitSet > >>>>> required); > >>>>>>> pair.left is the current operator's new traitset, pair.right is the > >>>>> list > >>>>>>> of children's required traitset. > >>>>>>> > >>>>>>> - RelNode passThrough(RelTraitSet required); > >>>>>>> Default implementation will call above method requireTraits() and > >>>>>>> RelNode.copy() to create new RelNode. Available to be overriden for > >>>>>>> physical operators to customize the logic. > >>>>>>> > >>>>>>> The planner will call above method on MergeJoin operator to pass the > >>>>>>> required traits (Hash[a]) to Mergejoin's child operators. > >>>>>>> > >>>>>>> We will get a new MergeJoin: > >>>>>>> MergeJoin hash[a] > >>>>>>> |---- TableScan R hash[a] (RelSubset) > >>>>>>> +---- TableScan S hash[a] (RelSubset) > >>>>>>> > >>>>>>> Now the MEMO group looks like: > >>>>>>> | Group No | Operator | Derived Traits | Required Traits > >>>>> | > >>>>>>> | ---------- | -------- ----- | -------------------- | > >>>>>>> --------------------- | > >>>>>>> | Group1 | Aggregate | Hash[a] | N/A > >>>>>>> | > >>>>>>> | Group2 | MergeJoin | Hash[a,b], Hash[a]| Hash[a] > >>>>>>> | > >>>>>>> | Group3 | TableScan R | None | Hash[a,b], > >>>>> Hash[a] > >>>>>>> | > >>>>>>> | Group4 | TableScan S | None | Hash[a,b], > >>>>> Hash[a] > >>>>>>> | > >>>>>>> > >>>>>>> Calcite user may choose to ignore / not implement the interface to > >>>>>>> keep > >>>>>>> the original behavior. Each physical operator, according to its own > >>>>> logic, > >>>>>>> decides whether passThrough() should pass traits down or not by > >>>>> returning: > >>>>>>> - a non-null RelNode, which means it can pass down > >>>>>>> - null object, which means can't pass down > >>>>>>> > >>>>>>> 2. Provide option to disable AbstractConverter > >>>>>>> Once the plan can request traits in top-down way in the framework, > >>>>>>> many > >>>>>>> system don't need AbstractConverter anymore, since it is just a > >>>>>>> intermediate operator to generate physical sort / exchange. For those, > >>>>> we > >>>>>>> can provide option to disable AbstractConverter, generate physical > >>>>> enforcer > >>>>>>> directly by adding a method to interface Convention: > >>>>>>> - RelNode enforce(RelNode node, RelTraitSet traits); > >>>>>>> > >>>>>>> The default implementation may just calling > >>>>> changeTraitsUsingConverters(), > >>>>>>> but people may choose to override it if the system has special needs, > >>>>> like > >>>>>>> several traits must implement together, or the position of collation > >>>>>>> in > >>>>>>> RelTraitSet is before distribution. > >>>>>>> > >>>>>>> 3. Top-down driven, on-demand rule apply > >>>>>>> For every RelNode in a RelSet, rule is matched and applied > >>>>> sequentially, > >>>>>>> newly generated RelNodes are added to the end of RelNode list in the > >>>>> RelSet > >>>>>>> waiting for rule apply. RuleQueue and DeferringRuleCall is not needed > >>>>>>> anymore. This will make space pruning and rule mutual exclusivity > >>>>>>> check > >>>>>>> possible. > >>>>>>> > >>>>>>> There are 3 stages for each RelSet: > >>>>>>> a). Exploration: logical transformation, yields logical nodes > >>>>>>> b). Implementation: physical transformation, yields physical nodes > >>>>>>> c). Optimization: trait request, enforcement > >>>>>>> > >>>>>>> The general process looks like: > >>>>>>> - optimize RelSet X: > >>>>>>> implement RelSet X > >>>>>>> for each physical relnode in RelSet X: > >>>>>>> // pass down trait requests to child RelSets > >>>>>>> for each child RelSet Y of current relnode: > >>>>>>> optimize RelSet Y > >>>>>>> > >>>>>>> - implement RelSet X: > >>>>>>> if X has been implemented: > >>>>>>> return > >>>>>>> explore RelSet X > >>>>>>> for each relnode in RelSet X: > >>>>>>> apply physical rules > >>>>>>> - explore RelSet X: > >>>>>>> if X has been explored > >>>>>>> return > >>>>>>> for each relnode in RelSet X: > >>>>>>> // ensure each child RelSet of current relnode is explored > >>>>>>> for each child RelSet Y of current relnode: > >>>>>>> explore RelSet Y > >>>>>>> apply logical rules on current relnode > >>>>>>> > >>>>>>> Basically it is a state machine with several states: Initialized, > >>>>>>> Explored, Exploring, Implemented, Implementing, Optimized, Optimizing > >>>>> and > >>>>>>> several transition methods: exploreRelSet, exploreRelNode, > >>>>> implementRelSet, > >>>>>>> implementRelNode, optimizeRelSet, optimizeRelNode... > >>>>>>> > >>>>>>> To achieve this, we need to mark the rules either logical rule or > >>>>> physical > >>>>>>> rule. > >>>>>>> To keep backward compatibility, all the un-marked rules will be > >>>>> treated as > >>>>>>> logical rules, except rules that uses AbstractConverter as rule > >>>>> operand, > >>>>>>> these rules still need to applied top-down, or random order. > >>>>>>> > >>>>>>> > >>>>>>> 4. On-demand, bottom-up trait derivation > >>>>>>> It is called bottom-up, but actually driven by top-down, happens same > >>>>> time > >>>>>>> as top-down trait request, in optimization stage mentioned above. Many > >>>>>>> Calcite based bigdata system only propagate traits on Project and > >>>>> Filter by > >>>>>>> writing rules, which is very limited. In fact, we can extend trait > >>>>>>> propagation/derivation to all the operators, without rules, by adding > >>>>>>> interface PhysicalNode (or extending RelNode) with method: > >>>>>>> - RelNode derive(RelTraitSet traits, int childId); > >>>>>>> > >>>>>>> Given the following plan (only consider distribution for simplicity): > >>>>>>> Agg [a,b] > >>>>>>> +-- MergeJoin [a] > >>>>>>> |---- TableScan R > >>>>>>> +--- TableScan S > >>>>>>> > >>>>>>> Hash[a] won't satisfy Hash[a,b] without special treatment, because > >>>>> there > >>>>>>> isn't a mechanism to coordinate traits between children. > >>>>>>> > >>>>>>> Now we call derive method on Agg [a,b] node, derive(Hash[a], 0), we > >>>>>>> get > >>>>>>> the new node: > >>>>>>> Agg [a] > >>>>>>> +-- MergeJoin [a] (RelSubset) > >>>>>>> > >>>>>>> We will provide different matching type, so each operator can specify > >>>>> what > >>>>>>> kind of matching type it requires its children: > >>>>>>> - MatchType getMatchType(RelTrait trait, int childId); > >>>>>>> > >>>>>>> a) Exact: Hash[a,b] exact match Hash[a,b], aka, satisfy > >>>>>>> b) Partial: Hash[a] partial match Hash[a,b] > >>>>>>> c) Permuted: Sort[a,b,c] permuted match Sort[c,b,a] > >>>>>>> > >>>>>>> In addition, optimization order is provided for each opertor to > >>>>> specify: > >>>>>>> a) left to right > >>>>>>> b) right to left > >>>>>>> c) both > >>>>>>> > >>>>>>> For example, for query SELECT * FROM R join S on R.a=S.a and R.b=S.b > >>>>> and > >>>>>>> R.c=S.c: > >>>>>>> Suppose R is distributed by a,b, and S is distributed by c. > >>>>>>> MergeJoin [a,b,c] > >>>>>>> |--- TableScan R [a,b] > >>>>>>> +-- TableScan S [c] > >>>>>>> a) left to right, call derive(Hash[a,b], 0), we get MergeJoin [a,b] > >>>>>>> b) right to left, call derive(Hash[c], 1), we get MergeJoin [c], most > >>>>>>> likely a bad plan > >>>>>>> c) both, get above 2 plans. > >>>>>>> > >>>>>>> For system that doesn't have a fine-tuned stats and cost model, it may > >>>>> not > >>>>>>> be able to make a right decision based purely on cost. Probably we > >>>>> need to > >>>>>>> provide the MergeJoin with both children's derived traitset list. > >>>>>>> - List<RelNode> derive(List<List<RelTraitSet>>); > >>>>>>> > >>>>>>> Of course, all above methods are optional to implement for those who > >>>>>>> doesn't need this feature. > >>>>>>> > >>>>>>> 5. Branch and Bound Space Pruning > >>>>>>> After we implement on-demand, top-down trait enforcement and > >>>>> rule-apply, > >>>>>>> we can pass the cost limit at the time of passing down required > >>>>> traits, as > >>>>>>> described in the classical Cascades paper. Right now, Calcite doesn't > >>>>>>> provide group level logical properties, including stats info, each > >>>>> operator > >>>>>>> in the same group has its own logical property and the stats may vary, > >>>>> so > >>>>>>> we can only do limited space pruning for trait enforcement, still > >>>>> good. But > >>>>>>> if we agree to add option to share group level stats between relnodes > >>>>> in a > >>>>>>> RelSet, we will be able to do more aggresive space pruning, which will > >>>>> help > >>>>>>> boost the performance of join reorder planning. > >>>>>>> > >>>>>>> > >>>>>>> With all that being said, how do we move forward? > >>>>>>> > >>>>>>> There are 2 ways: > >>>>>>> a) Modify on current VolcanoPlanner. > >>>>>>> Pros: code reuse, existing numerous test cases and infrastructure, > >>>>> fast > >>>>>>> integration > >>>>>>> Cons: changing code always brings risk > >>>>>>> > >>>>>>> b) Add a new XXXXPlanner > >>>>>>> Pros: no risk, no tech debt, no need to worry about backward > >>>>>>> compatability > >>>>>>> Cons: separate test cases for new planner, one more planner to > >>>>> maintain > >>>>>>> > >>>>>>> We'd like to hear the community's thoughts and advices. > >>>>>>> > >>>>>>> Thanks. > >>>>>>> - Haisheng > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >> > >> >