Hi Haisheng, Great explanation, thanks.
One thing I'd like to cover in advance is trait propagation process (I need it for Apache Ignite SQL engine implementation). For example: There are two nodes: Rel X and its child node Rel Y Both nodes are in Optimized state, and there is a Logical rule for Rel Y in a rules queue matched, After logical rule is applied, there is a logical rel Rel Y' and containing it set moves into Exploring state (since there is a logical node which has to be implemented) After whole process Exploring -> ... -> Optimized we need to force parent Rel X set to switch its state from Optimized to Optimizing to peek the newly implemented child and (possible) produce a new Rel X' physical rel on the basis of previously requested from the Rel X set traits. If a new Rel X' is produced, a Rel X' parent should move its state Optimized -> Optimizing and repeat described above operations. Does it look like true? Regards, Igor вс, 19 апр. 2020 г., 6:52 Haisheng Yuan <[email protected]>: > Hi, > > In the past few months, we have discussed a lot about Cascades style > top-down optimization, including on-demand trait derivation/request, rule > apply, branch and bound space pruning. Now we think it is time to move > towards these targets. > > We will separate it into several small issues, and each one can be > integrated as a standalone, independent feature, and most importantly, > meanwhile keep backward compatibility. > > 1. Top-down trait request > In other words, pass traits requirements from parent nodes to child nodes. > The trait requests happens after all the logical transformation rules and > physical implementation rules are done, in a top-down manner, driven from > root set. e.g.: > SELECT a, sum(c) FROM > (SELECT * FROM R JOIN S USING (a, b)) t > GROUP BY a; > > Suppose we have the following plan tree in the MEMO, and let's only > consider distribution for simplicity, each group represents a RelSet in the > MEMO. > > Group 1: Agg on [a] > Group 2: +-- MergeJoin on [a, b] > Group 3: |--- TableScan R > Group 4: +--- TableScan S > | Group No | Operator | Derived Traits | Required Traits | > | ----------- | ------------- | --------------- | --------------- | > | Group 1 | Aggregate | Hash[a] | N/A | > | Group 2 | MergeJoin | Hash[a,b] | Hash[a] | > | Group 3 | TableScan R | None | Hash[a,b] | > | Group 4 | TableScan S | None | Hash[a,b] | > > We will add new interface PhysicalNode (or extending RelNode) with > methods: > > - Pair<RelTraitSet,List<RelTraitSet>> requireTraits(RelTraitSet required); > pair.left is the current operator's new traitset, pair.right is the list > of children's required traitset. > > - RelNode passThrough(RelTraitSet required); > Default implementation will call above method requireTraits() and > RelNode.copy() to create new RelNode. Available to be overriden for > physical operators to customize the logic. > > The planner will call above method on MergeJoin operator to pass the > required traits (Hash[a]) to Mergejoin's child operators. > > We will get a new MergeJoin: > MergeJoin hash[a] > |---- TableScan R hash[a] (RelSubset) > +---- TableScan S hash[a] (RelSubset) > > Now the MEMO group looks like: > | Group No | Operator | Derived Traits | Required Traits | > | ---------- | -------- ----- | -------------------- | > --------------------- | > | Group1 | Aggregate | Hash[a] | N/A > | > | Group2 | MergeJoin | Hash[a,b], Hash[a]| Hash[a] > | > | Group3 | TableScan R | None | Hash[a,b], Hash[a] > | > | Group4 | TableScan S | None | Hash[a,b], Hash[a] > | > > Calcite user may choose to ignore / not implement the interface to keep > the original behavior. Each physical operator, according to its own logic, > decides whether passThrough() should pass traits down or not by returning: > - a non-null RelNode, which means it can pass down > - null object, which means can't pass down > > 2. Provide option to disable AbstractConverter > Once the plan can request traits in top-down way in the framework, many > system don't need AbstractConverter anymore, since it is just a > intermediate operator to generate physical sort / exchange. For those, we > can provide option to disable AbstractConverter, generate physical enforcer > directly by adding a method to interface Convention: > - RelNode enforce(RelNode node, RelTraitSet traits); > > The default implementation may just calling changeTraitsUsingConverters(), > but people may choose to override it if the system has special needs, like > several traits must implement together, or the position of collation in > RelTraitSet is before distribution. > > 3. Top-down driven, on-demand rule apply > For every RelNode in a RelSet, rule is matched and applied sequentially, > newly generated RelNodes are added to the end of RelNode list in the RelSet > waiting for rule apply. RuleQueue and DeferringRuleCall is not needed > anymore. This will make space pruning and rule mutual exclusivity check > possible. > > There are 3 stages for each RelSet: > a). Exploration: logical transformation, yields logical nodes > b). Implementation: physical transformation, yields physical nodes > c). Optimization: trait request, enforcement > > The general process looks like: > - optimize RelSet X: > implement RelSet X > for each physical relnode in RelSet X: > // pass down trait requests to child RelSets > for each child RelSet Y of current relnode: > optimize RelSet Y > > - implement RelSet X: > if X has been implemented: > return > explore RelSet X > for each relnode in RelSet X: > apply physical rules > - explore RelSet X: > if X has been explored > return > for each relnode in RelSet X: > // ensure each child RelSet of current relnode is explored > for each child RelSet Y of current relnode: > explore RelSet Y > apply logical rules on current relnode > > Basically it is a state machine with several states: Initialized, > Explored, Exploring, Implemented, Implementing, Optimized, Optimizing and > several transition methods: exploreRelSet, exploreRelNode, implementRelSet, > implementRelNode, optimizeRelSet, optimizeRelNode... > > To achieve this, we need to mark the rules either logical rule or physical > rule. > To keep backward compatibility, all the un-marked rules will be treated as > logical rules, except rules that uses AbstractConverter as rule operand, > these rules still need to applied top-down, or random order. > > > 4. On-demand, bottom-up trait derivation > It is called bottom-up, but actually driven by top-down, happens same time > as top-down trait request, in optimization stage mentioned above. Many > Calcite based bigdata system only propagate traits on Project and Filter by > writing rules, which is very limited. In fact, we can extend trait > propagation/derivation to all the operators, without rules, by adding > interface PhysicalNode (or extending RelNode) with method: > - RelNode derive(RelTraitSet traits, int childId); > > Given the following plan (only consider distribution for simplicity): > Agg [a,b] > +-- MergeJoin [a] > |---- TableScan R > +--- TableScan S > > Hash[a] won't satisfy Hash[a,b] without special treatment, because there > isn't a mechanism to coordinate traits between children. > > Now we call derive method on Agg [a,b] node, derive(Hash[a], 0), we get > the new node: > Agg [a] > +-- MergeJoin [a] (RelSubset) > > We will provide different matching type, so each operator can specify what > kind of matching type it requires its children: > - MatchType getMatchType(RelTrait trait, int childId); > > a) Exact: Hash[a,b] exact match Hash[a,b], aka, satisfy > b) Partial: Hash[a] partial match Hash[a,b] > c) Permuted: Sort[a,b,c] permuted match Sort[c,b,a] > > In addition, optimization order is provided for each opertor to specify: > a) left to right > b) right to left > c) both > > For example, for query SELECT * FROM R join S on R.a=S.a and R.b=S.b and > R.c=S.c: > Suppose R is distributed by a,b, and S is distributed by c. > MergeJoin [a,b,c] > |--- TableScan R [a,b] > +-- TableScan S [c] > a) left to right, call derive(Hash[a,b], 0), we get MergeJoin [a,b] > b) right to left, call derive(Hash[c], 1), we get MergeJoin [c], most > likely a bad plan > c) both, get above 2 plans. > > For system that doesn't have a fine-tuned stats and cost model, it may not > be able to make a right decision based purely on cost. Probably we need to > provide the MergeJoin with both children's derived traitset list. > - List<RelNode> derive(List<List<RelTraitSet>>); > > Of course, all above methods are optional to implement for those who > doesn't need this feature. > > 5. Branch and Bound Space Pruning > After we implement on-demand, top-down trait enforcement and rule-apply, > we can pass the cost limit at the time of passing down required traits, as > described in the classical Cascades paper. Right now, Calcite doesn't > provide group level logical properties, including stats info, each operator > in the same group has its own logical property and the stats may vary, so > we can only do limited space pruning for trait enforcement, still good. But > if we agree to add option to share group level stats between relnodes in a > RelSet, we will be able to do more aggresive space pruning, which will help > boost the performance of join reorder planning. > > > With all that being said, how do we move forward? > > There are 2 ways: > a) Modify on current VolcanoPlanner. > Pros: code reuse, existing numerous test cases and infrastructure, fast > integration > Cons: changing code always brings risk > > b) Add a new XXXXPlanner > Pros: no risk, no tech debt, no need to worry about backward > compatability > Cons: separate test cases for new planner, one more planner to maintain > > We'd like to hear the community's thoughts and advices. > > Thanks. > - Haisheng > > > > > > >
