Re: [DISCUSS] Towards Cascades Optimizer

Seliverstov Igor Mon, 20 Apr 2020 04:18:24 -0700

Haisheng, Xiening,

Ok, Now I see how it should work.


Thanks for your replies.

Regards,
Igor

> 20 апр. 2020 г., в 09:56, Seliverstov Igor <gvvinbl...@gmail.com> написал(а):
> 
> Haisheng, Xiening,
> 
> Thanks for clarifying. 
> 
> In this proposal, we are not trying to split logical and physical planning 
> entirely. - actually I was in doubt about an idea of entire splitting logical 
> and physical phases, if you aren't going to, I have no objections.
> 
> But it returns me to my first question: how we will propagate traits in 
> bottom-up manner using proposed approach. (See [DISCUSS] Proposal to add API 
> to force rules matching specific rels for details)
> 
> One of inconveniences of current VolcanoPlanner implementation is amount of 
> tricks that we need to get desired behaviour. It would be great if some of 
> issues (or all of them) were solved in the new approach.
> 
> Regards,
> Igor
> 
> пн, 20 апр. 2020 г., 7:02 Xiening Dai <xndai....@gmail.com 
> <mailto:xndai....@gmail.com>>:
> Hi Igor,
> 
> Your comment - "because actual cost may be calculated correctly using 
> physical operators only. So won't be able to implement Branch and Bound Space 
> Pruning.“ is actually not true. In Cascade’s lower bound / upper bound 
> pruning algorithm, you can get cost lower bound of input RelNode using 
> cardinality * unit_copy_cost. The unit_copy_cost is a constant, which stands 
> for the minimal cost for processing a tuple from the input. So this will be 
> the minimal cost the input RelNode can achieve, and if this is indeed larger 
> than the current best cost, this input node can be pruned. 
> 
> In this proposal, we are not trying to split logical and physical planning 
> entirely. But for any given equivalent set, we would need to finish 
> exploration before implementation. But for the entire memo tree, each set 
> could be in different planning stage, and an equivalent set can be pruned 
> even before it’s implemented. A text book example of aforementioned “lower 
> bound / upper bound pruning” would be the join ordering case.
> 
> Regarding #3, I think we can still achieve that (partially) through this 
> proposal. Remember every time when we start the optimization, we pass down an 
> upper bound limit. Initially this upper bound for root is infinite - meaning 
> that no feasible plan is available. Every time when we find a physical plan 
> we update this upper bound, then start the search again. We could stop the 
> search when the cost is less than a pre-defined threshold - which gives you a 
> “good enough” plan with early termination. Still this wouldn't avoid the 
> logical exploration. For that, you would probably archive through rule 
> configurations, and avoid some expansive transformation to keep the cost 
> down. 
> 
> 
> > On Apr 19, 2020, at 7:30 PM, Haisheng Yuan <hy...@apache.org 
> > <mailto:hy...@apache.org>> wrote:
> > 
> > Igor,
> > 
> > a) Given current Calcite's stats derivation strategy, mixing logical and 
> > physical planning won't make it better. I hope you went through my email to 
> > the end, currently operators inside a memo group don't share stats info, 
> > each operator's stats may differ with the other one, and the RelSubset's 
> > best node and stats may vary during optimization. So logical and physical 
> > rule pruning are not safe at the moment, otherwise it almost implies 
> > changing downstream project's cost model silently. 
> > 
> > On the other hand, ensuring child group exploration task finish first will 
> > make rule mutual exclusivity check possible, like the result of 
> > ReduceExpressionRule won't need trigger the same rule again, The join 
> > result of JoinCommuteRule won't need trigger JoinCommuteRule and 
> > ReduceExpressionRule again. 
> > 
> > More importantly, most if not all the the long planning queries in Calcite 
> > are not caused by too many alternatives, but mutual triggering, or cyclic 
> > triggering, which can be avoided by customizing the rules instead of using 
> > the default one. Unless you use dynamic programming (Calcite use heuristic) 
> > on join reordering (left-deep, right-deep, bushy), space pruning won't help 
> > make long / infinite running query faster.
> > 
> > b) No evidence shows current version of Calcite will return the most 
> > promising plan in first planning iteration. Instead of praying for getting 
> > good enough plan in the first iteration, why not focus on fixing rules that 
> > causes the issue?
> > 
> > c) That is not the goal.
> > 
> > On 2020/04/19 15:14:57, Seliverstov Igor <gvvinbl...@gmail.com 
> > <mailto:gvvinbl...@gmail.com>> wrote: 
> >> Haisheng,
> >> 
> >> From my point of view splitting logical and physical planning parts isn’t 
> >> a good idea. 
> >> 
> >> I think so because actual cost may be calculated correctly using physical 
> >> operators only. So that we:
> >> a) won't be able to implement Branch and Bound Space Pruning (as far as I 
> >> understand, at exploring time there are no physical operators, no correct 
> >> costs, but assumptions only, I don’t think we should rely on them)
> >> b) won’t be able to get good enough plan (without Branch and Bound Space 
> >> Pruning it’s non-trivial task to get right search direction and most 
> >> promising plans in first planning iterations)
> >> c) won’t be able to tune the good enough plan during several similar 
> >> queries are executed
> >> 
> >> Regards,
> >> Igor
> >> 
> >> 
> >>> 19 апр. 2020 г., в 17:37, Haisheng Yuan <hy...@apache.org 
> >>> <mailto:hy...@apache.org>> написал(а):
> >>> 
> >>> Hi Igor,
> >>> 
> >>> You can't have your cake and eat it.
> >>> But one feasible work item definitely we can do is that once timeout, 
> >>> stop exploring, use the first available physical operator in each group 
> >>> and optimize it.
> >>> 
> >>> Because most, if not all, of the long / infinite running optimizations 
> >>> are caused by project related transpose, join reordering (join commute + 
> >>> associative), constant reduction for large expression (see CALCITE-3460), 
> >>> all of which are logical transformations rules and many of which have 
> >>> corresponding JIRAs. So another alternative is, let's fix these bugs to 
> >>> improve Calcite to make timeout option less usable.
> >>> 
> >>> Another thing worth noting is that sql server optimizer timeout mechanism 
> >>> is not based on clock time, but the number of optimization tasks it has 
> >>> done [1].
> >>> 
> >>> [1] 
> >>> https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188
> >>>  
> >>> <https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188>
> >>> 
> >>> Regards,
> >>> Haisheng 
> >>> 
> >>> On 2020/04/19 11:31:27, Seliverstov Igor <gvvinbl...@gmail.com 
> >>> <mailto:gvvinbl...@gmail.com>> wrote: 
> >>>> Haisheng,
> >>>> 
> >>>> Ok, then such notification isn't needed
> >>>> 
> >>>> But in this case we don't have any control over how long planning takes
> >>>> 
> >>>> For some systems it's necessary to get good enough plan right now instead
> >>>> of best one after while
> >>>> 
> >>>> For example we've been considering a case when a query is optimised 
> >>>> several
> >>>> times in short iterations in case it's impossible to get the best plan in
> >>>> reasonable period of time (for example there is an SLA for response time)
> >>>> 
> >>>> This mean we need all needed physical implementations after each logical
> >>>> transformation is applied.
> >>>> 
> >>>> Regards,
> >>>> Igor
> >>>> 
> >>>> 
> >>>> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan <hy...@apache.org 
> >>>> <mailto:hy...@apache.org>>:
> >>>> 
> >>>>> Hi Igor,
> >>>>> 
> >>>>> There will be no rule queue anymore. Y will be fully explored (logical
> >>>>> rules are matched and applied) before it can be implemented and 
> >>>>> optimized.
> >>>>> 
> >>>>> Thanks,
> >>>>> Haisheng
> >>>>> 
> >>>>> On 2020/04/19 10:11:45, Seliverstov Igor <gvvinbl...@gmail.com 
> >>>>> <mailto:gvvinbl...@gmail.com>> wrote:
> >>>>>> Hi Haisheng,
> >>>>>> 
> >>>>>> Great explanation, thanks.
> >>>>>> 
> >>>>>> One thing I'd like to cover in advance is trait propagation process (I
> >>>>> need
> >>>>>> it for Apache Ignite SQL engine implementation).
> >>>>>> 
> >>>>>> For example:
> >>>>>> 
> >>>>>> There are two nodes: Rel X and its child node Rel Y
> >>>>>> 
> >>>>>> Both nodes are in Optimized state, and there is a Logical rule for Rel 
> >>>>>> Y
> >>>>> in
> >>>>>> a rules queue matched,
> >>>>>> 
> >>>>>> After logical rule is applied, there is a logical rel Rel Y' and
> >>>>> containing
> >>>>>> it set moves into Exploring state (since there is a logical node which
> >>>>> has
> >>>>>> to be implemented)
> >>>>>> 
> >>>>>> After whole process Exploring -> ... -> Optimized we need to force 
> >>>>>> parent
> >>>>>> Rel X set to switch its state from Optimized to Optimizing to peek the
> >>>>>> newly implemented child and (possible) produce a new Rel X' physical 
> >>>>>> rel
> >>>>> on
> >>>>>> the basis of previously requested from the Rel X set traits.
> >>>>>> 
> >>>>>> If a new Rel X' is produced, a Rel X' parent should move its state
> >>>>>> Optimized -> Optimizing and repeat described above operations.
> >>>>>> 
> >>>>>> Does it look like true?
> >>>>>> 
> >>>>>> Regards,
> >>>>>> Igor
> >>>>>> 
> >>>>>> 
> >>>>>> вс, 19 апр. 2020 г., 6:52 Haisheng Yuan <h.y...@alibaba-inc.com 
> >>>>>> <mailto:h.y...@alibaba-inc.com>>:
> >>>>>> 
> >>>>>>> Hi,
> >>>>>>> 
> >>>>>>> In the past few months, we have discussed a lot about Cascades style
> >>>>>>> top-down optimization, including on-demand trait derivation/request,
> >>>>> rule
> >>>>>>> apply, branch and bound space pruning. Now we think it is time to move
> >>>>>>> towards these targets.
> >>>>>>> 
> >>>>>>> We will separate it into several small issues, and each one can be
> >>>>>>> integrated as a standalone, independent feature, and most importantly,
> >>>>>>> meanwhile keep backward compatibility.
> >>>>>>> 
> >>>>>>> 1. Top-down trait request
> >>>>>>> In other words, pass traits requirements from parent nodes to child
> >>>>> nodes.
> >>>>>>> The trait requests happens after all the logical transformation rules
> >>>>> and
> >>>>>>> physical implementation rules are done, in a top-down manner, driven
> >>>>> from
> >>>>>>> root set. e.g.:
> >>>>>>> SELECT a, sum(c) FROM
> >>>>>>>   (SELECT * FROM R JOIN S USING (a, b)) t
> >>>>>>> GROUP BY a;
> >>>>>>> 
> >>>>>>> Suppose we have the following plan tree in the MEMO, and let's only
> >>>>>>> consider distribution for simplicity, each group represents a RelSet
> >>>>> in the
> >>>>>>> MEMO.
> >>>>>>> 
> >>>>>>>  Group 1:     Agg on [a]
> >>>>>>>  Group 2:           +-- MergeJoin on [a, b]
> >>>>>>>  Group 3:                       |--- TableScan R
> >>>>>>>  Group 4:                       +--- TableScan S
> >>>>>>>  | Group No | Operator    | Derived Traits | Required Traits |
> >>>>>>>  | ----------- | ------------- | --------------- | --------------- |
> >>>>>>>  | Group 1  | Aggregate    | Hash[a]         | N/A                |
> >>>>>>>  | Group 2  | MergeJoin    | Hash[a,b]      | Hash[a]          |
> >>>>>>>  | Group 3  | TableScan R | None            | Hash[a,b]       |
> >>>>>>>  | Group 4  | TableScan S | None            | Hash[a,b]       |
> >>>>>>> 
> >>>>>>> We will add new interface PhysicalNode (or extending RelNode) with
> >>>>>>> methods:
> >>>>>>> 
> >>>>>>> - Pair<RelTraitSet,List<RelTraitSet>> requireTraits(RelTraitSet
> >>>>> required);
> >>>>>>> pair.left is the current operator's new traitset, pair.right is the
> >>>>> list
> >>>>>>> of children's required traitset.
> >>>>>>> 
> >>>>>>> - RelNode passThrough(RelTraitSet required);
> >>>>>>> Default implementation will call above method requireTraits() and
> >>>>>>> RelNode.copy() to create new RelNode. Available to be overriden for
> >>>>>>> physical operators to customize the logic.
> >>>>>>> 
> >>>>>>> The planner will call above method on MergeJoin operator to pass the
> >>>>>>> required traits (Hash[a]) to Mergejoin's child operators.
> >>>>>>> 
> >>>>>>> We will get a new MergeJoin:
> >>>>>>> MergeJoin hash[a]
> >>>>>>>         |---- TableScan R hash[a] (RelSubset)
> >>>>>>>         +---- TableScan S hash[a] (RelSubset)
> >>>>>>> 
> >>>>>>> Now the MEMO group looks like:
> >>>>>>>  | Group No | Operator    | Derived Traits       | Required Traits
> >>>>>   |
> >>>>>>>  | ---------- | -------- ----- | -------------------- |
> >>>>>>> --------------------- |
> >>>>>>>  | Group1   | Aggregate   | Hash[a]                 | N/A
> >>>>>>>       |
> >>>>>>>  | Group2   | MergeJoin   | Hash[a,b], Hash[a]| Hash[a]
> >>>>>>> |
> >>>>>>>  | Group3   | TableScan R | None                    | Hash[a,b],
> >>>>> Hash[a]
> >>>>>>> |
> >>>>>>>  | Group4   | TableScan S | None                    | Hash[a,b],
> >>>>> Hash[a]
> >>>>>>> |
> >>>>>>> 
> >>>>>>> Calcite user may choose to ignore / not implement the interface to 
> >>>>>>> keep
> >>>>>>> the original behavior. Each physical operator, according to its own
> >>>>> logic,
> >>>>>>> decides whether passThrough() should pass traits down or not by
> >>>>> returning:
> >>>>>>> - a non-null RelNode, which means it can pass down
> >>>>>>> - null object, which means can't pass down
> >>>>>>> 
> >>>>>>> 2. Provide option to disable AbstractConverter
> >>>>>>> Once the plan can request traits in top-down way in the framework, 
> >>>>>>> many
> >>>>>>> system don't need AbstractConverter anymore, since it is just a
> >>>>>>> intermediate operator to generate physical sort / exchange. For those,
> >>>>> we
> >>>>>>> can provide option to disable AbstractConverter, generate physical
> >>>>> enforcer
> >>>>>>> directly by adding a method to interface Convention:
> >>>>>>> - RelNode enforce(RelNode node, RelTraitSet traits);
> >>>>>>> 
> >>>>>>> The default implementation may just calling
> >>>>> changeTraitsUsingConverters(),
> >>>>>>> but people may choose to override it if the system has special needs,
> >>>>> like
> >>>>>>> several traits must implement together, or the position of collation 
> >>>>>>> in
> >>>>>>> RelTraitSet is before distribution.
> >>>>>>> 
> >>>>>>> 3. Top-down driven, on-demand rule apply
> >>>>>>> For every RelNode in a RelSet, rule is matched and applied
> >>>>> sequentially,
> >>>>>>> newly generated RelNodes are added to the end of RelNode list in the
> >>>>> RelSet
> >>>>>>> waiting for rule apply. RuleQueue and DeferringRuleCall is not needed
> >>>>>>> anymore. This will make space pruning and rule mutual exclusivity 
> >>>>>>> check
> >>>>>>> possible.
> >>>>>>> 
> >>>>>>> There are 3 stages for each RelSet:
> >>>>>>> a). Exploration: logical transformation, yields logical nodes
> >>>>>>> b). Implementation: physical transformation, yields physical nodes
> >>>>>>> c). Optimization: trait request, enforcement
> >>>>>>> 
> >>>>>>> The general process looks like:
> >>>>>>> - optimize RelSet X:
> >>>>>>> implement RelSet X
> >>>>>>>   for each physical relnode in RelSet X:
> >>>>>>>       // pass down trait requests to child RelSets
> >>>>>>>       for each child RelSet Y of current relnode:
> >>>>>>>           optimize RelSet Y
> >>>>>>> 
> >>>>>>> - implement RelSet X:
> >>>>>>>   if X has been implemented:
> >>>>>>>       return
> >>>>>>> explore RelSet X
> >>>>>>>   for each relnode in RelSet X:
> >>>>>>>       apply physical rules
> >>>>>>> - explore RelSet X:
> >>>>>>>    if X has been explored
> >>>>>>>       return
> >>>>>>> for each relnode in RelSet X:
> >>>>>>>       // ensure each child RelSet of current relnode is explored
> >>>>>>> for each child RelSet Y of current relnode:
> >>>>>>>          explore RelSet Y
> >>>>>>>       apply logical rules on current relnode
> >>>>>>> 
> >>>>>>> Basically it is a state machine with several states: Initialized,
> >>>>>>> Explored, Exploring, Implemented, Implementing, Optimized, Optimizing
> >>>>> and
> >>>>>>> several transition methods: exploreRelSet, exploreRelNode,
> >>>>> implementRelSet,
> >>>>>>> implementRelNode, optimizeRelSet, optimizeRelNode...
> >>>>>>> 
> >>>>>>> To achieve this, we need to mark the rules either logical rule or
> >>>>> physical
> >>>>>>> rule.
> >>>>>>> To keep backward compatibility, all the un-marked rules will be
> >>>>> treated as
> >>>>>>> logical rules, except rules that uses AbstractConverter as rule
> >>>>> operand,
> >>>>>>> these rules still need to applied top-down, or random order.
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 4. On-demand, bottom-up trait derivation
> >>>>>>> It is called bottom-up, but actually driven by top-down, happens same
> >>>>> time
> >>>>>>> as top-down trait request, in optimization stage mentioned above. Many
> >>>>>>> Calcite based bigdata system only propagate traits on Project and
> >>>>> Filter by
> >>>>>>> writing rules, which is very limited. In fact, we can extend trait
> >>>>>>> propagation/derivation to all the operators, without rules, by adding
> >>>>>>> interface PhysicalNode (or extending RelNode) with method:
> >>>>>>> - RelNode derive(RelTraitSet traits, int childId);
> >>>>>>> 
> >>>>>>> Given the following plan (only consider distribution for simplicity):
> >>>>>>>  Agg [a,b]
> >>>>>>>     +-- MergeJoin [a]
> >>>>>>>                |---- TableScan R
> >>>>>>>                +--- TableScan S
> >>>>>>> 
> >>>>>>> Hash[a] won't satisfy Hash[a,b] without special treatment, because
> >>>>> there
> >>>>>>> isn't a mechanism to coordinate traits between children.
> >>>>>>> 
> >>>>>>> Now we call derive method on Agg [a,b] node, derive(Hash[a], 0), we 
> >>>>>>> get
> >>>>>>> the new node:
> >>>>>>> Agg [a]
> >>>>>>> +-- MergeJoin [a] (RelSubset)
> >>>>>>> 
> >>>>>>> We will provide different matching type, so each operator can specify
> >>>>> what
> >>>>>>> kind of matching type it requires its children:
> >>>>>>> - MatchType getMatchType(RelTrait trait, int childId);
> >>>>>>> 
> >>>>>>> a) Exact: Hash[a,b] exact match Hash[a,b], aka, satisfy
> >>>>>>> b) Partial: Hash[a] partial match Hash[a,b]
> >>>>>>> c) Permuted: Sort[a,b,c] permuted match Sort[c,b,a]
> >>>>>>> 
> >>>>>>> In addition, optimization order is provided for each opertor to
> >>>>> specify:
> >>>>>>> a) left to right
> >>>>>>> b) right to left
> >>>>>>> c) both
> >>>>>>> 
> >>>>>>> For example, for query SELECT * FROM R join S on R.a=S.a and R.b=S.b
> >>>>> and
> >>>>>>> R.c=S.c:
> >>>>>>> Suppose R is distributed by a,b, and S is distributed by c.
> >>>>>>> MergeJoin [a,b,c]
> >>>>>>>      |--- TableScan R [a,b]
> >>>>>>>      +-- TableScan S [c]
> >>>>>>> a) left to right, call derive(Hash[a,b], 0), we get MergeJoin [a,b]
> >>>>>>> b) right to left, call derive(Hash[c], 1), we get MergeJoin [c], most
> >>>>>>> likely a bad plan
> >>>>>>> c) both, get above 2 plans.
> >>>>>>> 
> >>>>>>> For system that doesn't have a fine-tuned stats and cost model, it may
> >>>>> not
> >>>>>>> be able to make a right decision based purely on cost. Probably we
> >>>>> need to
> >>>>>>> provide the MergeJoin with both children's derived traitset list.
> >>>>>>> - List<RelNode> derive(List<List<RelTraitSet>>);
> >>>>>>> 
> >>>>>>> Of course, all above methods are optional to implement for those who
> >>>>>>> doesn't need this feature.
> >>>>>>> 
> >>>>>>> 5. Branch and Bound Space Pruning
> >>>>>>> After we implement on-demand, top-down trait enforcement and
> >>>>> rule-apply,
> >>>>>>> we can pass the cost limit at the time of passing down required
> >>>>> traits, as
> >>>>>>> described in the classical Cascades paper. Right now, Calcite doesn't
> >>>>>>> provide group level logical properties, including stats info, each
> >>>>> operator
> >>>>>>> in the same group has its own logical property and the stats may vary,
> >>>>> so
> >>>>>>> we can only do limited space pruning for trait enforcement, still
> >>>>> good. But
> >>>>>>> if we agree to add option to share group level stats between relnodes
> >>>>> in a
> >>>>>>> RelSet, we will be able to do more aggresive space pruning, which will
> >>>>> help
> >>>>>>> boost the performance of join reorder planning.
> >>>>>>> 
> >>>>>>> 
> >>>>>>> With all that being said, how do we move forward?
> >>>>>>> 
> >>>>>>> There are 2 ways:
> >>>>>>> a) Modify on current VolcanoPlanner.
> >>>>>>> Pros: code reuse, existing numerous test cases and infrastructure,
> >>>>> fast
> >>>>>>> integration
> >>>>>>> Cons: changing code always brings risk
> >>>>>>> 
> >>>>>>> b) Add a new XXXXPlanner
> >>>>>>> Pros: no risk, no tech debt, no need to worry about backward
> >>>>>>> compatability
> >>>>>>> Cons: separate test cases for new planner, one more planner to
> >>>>> maintain
> >>>>>>> 
> >>>>>>> We'd like to hear the community's thoughts and advices.
> >>>>>>> 
> >>>>>>> Thanks.
> >>>>>>> - Haisheng
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>> 
> >>>>> 
> >>>> 
> >> 
> >> 
>

Re: [DISCUSS] Towards Cascades Optimizer

Reply via email to