It's been some time that I didn't look into the code but the most recent Hive paper [1] mostly talks about Calcite in the query optimization section so I have to say I am a bit surprised.
[1] https://arxiv.org/pdf/1903.10970.pdf On Mon, Dec 9, 2019 at 6:21 PM Vladimir Ozerov <[email protected]> wrote: > After looking at Hive implementation I have the impression that it doesn't > use Apache Calcite for physical planning, hence it doesn't have the > problems mentioned in this topic. > > вс, 8 дек. 2019 г. в 18:55, Vladimir Ozerov <[email protected]>: > > > Hi Stamatis, > > > > Thank you for the idea about Hive. I looked at it some time ago and the > > codebase was substantially more complex to understand for me than in > other > > projects, so I gave up. I'll try to do the analysis again. > > I'd like to mention that I also had a thought that maybe the > > implementation of a top-down optimization is not a concern of > > VolcanoPlanner, and the brand new planner may play well here. But from a > > practical perspective, of course, I keep a hope that we will find a less > > intrusive way to introduce efficient physical optimization into > > VolcanoPlanner :-) > > > > Regards, > > Vladimir. > > > > вс, 8 дек. 2019 г. в 12:42, Stamatis Zampetakis <[email protected]>: > > > >> Thanks Vladimir for this great summary. It is really helpful to know how > >> the different projects use the optimizer and it certainly helps to > >> identify > >> limitations on our implementation. > >> > >> I cannot provide any valuable feedback at the moment since I have to > find > >> some time to read more carefully your analysis. > >> > >> In the meantime, I know that Hive is also using Calcite for quite some > >> time > >> now so maybe you can get some new ideas (or complete your background > >> study) > >> by looking in their code. > >> > >> @Haisheng: I think many people did appreciate the discussion for pull up > >> traits so I wouldn't say that we abandoned it. I had the impression that > >> we > >> were waiting a design doc. > >> > >> In general it may not be feasible to cover all use cases with a single > >> optimizer. I wouldn't find it bad to introduce another planner if there > >> are > >> enough reasons to do so. > >> > >> Best, > >> Stamatis > >> > >> > >> On Fri, Dec 6, 2019, 11:00 AM Vladimir Ozerov <[email protected]> > wrote: > >> > >> > "all we know is their *collations*" -> "all we know is their *traits*" > >> > > >> > пт, 6 дек. 2019 г. в 12:57, Vladimir Ozerov <[email protected]>: > >> > > >> > > Hi Haisheng, > >> > > > >> > > Thank you for your response. Let me elaborate my note on join > planning > >> > > first - what I was trying to say is not that rules on their own have > >> some > >> > > deficiencies. What I meant is that with current planner > >> implementation, > >> > > users tend to separate join planning from the core optimization > >> process > >> > > like this in the pseudo-code below. As a result, only one join > >> > permutation > >> > > is considered during physical planning, even though join rule may > >> > > potentially generate multiple plans worth exploring: > >> > > > >> > > RelNode optimizedLogicalNode = doJoinPlanning(logicalNode); > >> > > RelNode physicalNode = doPhysicalPlanning(optimizedLogicalNode); > >> > > > >> > > Now back to the main question. I re-read your thread about on-demand > >> > trait > >> > > propagation [1] carefully. I'd like to admit that when I was reading > >> it > >> > for > >> > > the first time about a month ago, I failed to understand some > details > >> due > >> > > to poor knowledge of different optimizer architectures. Now I > >> understand > >> > it > >> > > much better, and we definitely concerned with exactly the same > >> problem. I > >> > > feel that trait pull-up might be a step in the right direction, > >> however, > >> > it > >> > > seems to me that it is not the complete solution. Let me try to > >> explain > >> > why > >> > > I think so. > >> > > > >> > > The efficient optimizer should try to save CPU as much as possible > >> > because > >> > > it allows us to explore more plans in a sensible amount of time. To > >> > achieve > >> > > that we should avoid redundant operations, and detect and prune > >> > inefficient > >> > > paths aggressively. As far as I understand the idea of trait > pull-up, > >> we > >> > > essentially explore the space of possible physical properties of > >> children > >> > > nodes without forcing their implementation. But after that, the > >> Calcite > >> > > will explore that nodes again, now in order to execute > implementation > >> > > rules. I.e. we will do two dives - one to enumerate the nodes (trait > >> > > pull-up API), and the other one to implement them (implementation > >> rules), > >> > > while in Cascades one dive should be sufficient since exploration > >> invokes > >> > > the implementation rules as it goes. This is the first issue I see. > >> > > > >> > > The second one is more important - how to prune inefficient plans? > >> > > Currently, nodes are implemented independently and lack of context > >> > doesn't > >> > > allow us to estimates children's costs when implementing the parent, > >> > hence > >> > > branch-and-bound is not possible. Can trait pull-up API > >> > "List<RelTraitSet> > >> > > deriveTraitSets(RelNode, RelMetadataQuery)" help us with this? If > the > >> > > children nodes are not implemented before the pull-up, all we know > is > >> > their > >> > > collations, but not their costs. And without costs, pruning is not > >> > > possible. Please let me know if I missed something from the > proposal. > >> > > > >> > > The possible architecture I had in mind after reading multiple > papers, > >> > > which may answer all our questions, could look like this: > >> > > 1) We have a queue of nodes requiring optimization. Not a queue of > >> rules. > >> > > initial queue state is formed from the initial tree, top-down. > >> > > 2) The node is popped from the queue, and we enter > >> > > "node.optimize(maxCost)" call. It checks for matching rules, > >> prioritizes > >> > > them, and start their execution on by one. Execution of rules may > >> > re-insert > >> > > the current node into the queue, in which case this step is > repeated, > >> > > possibly many times > >> > > 3) Logical-logical rules (transformations) produce new logical nodes > >> and > >> > > put them into the queue for further optimization > >> > > 4) Logical-physical rules (implementation) do the following: > >> > > 4.1) Costs of logical children are estimated. The cost of a logical > >> node > >> > > should be less than any cost of a possible physical node that may be > >> > > produced out of it. If the logical cost exceeds "maxCost", we stop > and > >> > > return. The whole logical subspace is pruned even before > exploration. > >> > > 4.2) Recursively call "childNode.optimize(maxCost - > >> currentLogicalCost)" > >> > > method, which returns a set of possible physical implementations of > a > >> > > child. Returned physical children are already registered in proper > >> > > set/subset, but are not used for any pattern-matching, and doesn't > >> > trigger > >> > > more rule calls! > >> > > 4.3) Implementation rule checks the cost of the physical child. If > it > >> is > >> > > greater than any other already observed child with the same traits, > or > >> > > exceeds the "maxCost", it is discarded. Otherwise, the physical > >> > > implementation of the current node is produced and registered in the > >> > > optimizer. > >> > > > >> > > The pseudocode for physical implementation flow for join (two > inputs): > >> > > > >> > > Collection<RelNode> optimizePhysical(Cost maxCost) { > >> > > // Estimated minimal self-cost. Any physical implementation of > >> this > >> > > node should have greater self-cost > >> > > Cost logicalSelfCost = optimizer.getCost(this); > >> > > > >> > > // *Pruning #1*: whatever children we implement, the total cost > >> will > >> > > be greater than the passed maxCost, so do not explore further > >> > > Cost maxChildCost = maxCost - logicalSelfCost; > >> > > > >> > > Cost logicalILeftCost = optimizer.getCost(leftLogicalNode); > >> > > Cost logicalRightCost = optimizer.getCost(rightLogicalNode); > >> > > > >> > > if (logicalLeftCost + logicalRightCost > maxChildCost) { > >> > > return; > >> > > } > >> > > > >> > > // This is our equivalence set. > >> > > RelSet equivalenceSet = this.getSet(); > >> > > > >> > > // Get promising physical implementations of child nodes > >> recursively > >> > > List<RelNode> leftPhysicalNodes = > >> > > leftLogicalNode.optimizePhysical(maxChildCost); > >> > > List<RelNode> rightPhysicalNodes = > >> > > rightLigicalNode.optimizePhysical(maxChildCost); > >> > > > >> > > for (RelNode leftPhysicalNode : leftPhysicalNodes) { > >> > > for (RelNode rightPhysicalNode : rightPhysicalNodes) { > >> > > // *Pruning #2*: Combination of physical input costs is > >> > > already too expensive, give up > >> > > Cost physicalLeftCost = > >> optimizer.getCost(leftPhysicalNode); > >> > > Cost physicalRightCost = > >> > optimizer.getCost(rightPhysicalNode); > >> > > > >> > > if (logicalILeftCost + logicalRightCost > maxChildCost) > { > >> > > continue. > >> > > } > >> > > > >> > > // Implement possible physical nodes for the given pair > of > >> > > inputs (maybe more than one) > >> > > List<RelNode> physicalJoins = > implement(leftPhysicalNode, > >> > > rightPhysicalNode); > >> > > > >> > > for (RelOptRule physicalJoin : physicalJoins) { > >> > > // *Pruning #3*: Do not consider implementation if we > >> have > >> > > another one with the same trait set and smaller cost) > >> > > Cost physicalJoinCost = > >> optimizer.getCost(physicalJoin); > >> > > Cost bestCostForTraitSet = > >> > > equivalenceSet.getBestCost(physicalJoin.getTraitSet()); > >> > > > >> > > if (physicalJoinCost > bestCostForTraitSet) { > >> > > continue. > >> > > } > >> > > > >> > > // This is a good implementation. Register it in the > >> set, > >> > > updating per-traitset best costs. > >> > > equivalenceSet.register(physicalJoin); > >> > > } > >> > > } > >> > > } > >> > > > >> > > // Return the best registered expressions with different > traitsets > >> > > from the current set. > >> > > return equivalenceSet.getBestExps(); > >> > > } > >> > > > >> > > This is a very rough pseudo-code, only to demonstrate the basic idea > >> on > >> > > how proper bottom-up propagation not only helps us set proper traits > >> for > >> > > the new physical node but also ensures that not optimal plans are > >> pruned > >> > as > >> > > early as possible. Real implementation should be better abstracted > and > >> > > accept enforcers as well. > >> > > > >> > > Also, please notice that the pseudo-code doesn't show when logical > >> rules > >> > > are fired. This is a separate question. One possible straightforward > >> way > >> > is > >> > > to add the aforementioned physical routine to normal Volcano flow: > >> > > 1) Fire logical rule on a node and register new nodes > >> > > 2) Fire physical optimization as shown above, then invoke > >> > > "call.transformTo()" for every returned physical rel > >> > > 3) Re-trigger the process for newly created nodes and their parents > >> > > > >> > > A better approach is to interleave logical and physical > >> optimizations, so > >> > > they trigger each other recursively. But this would require a > serious > >> > > redesign of a "rule queue" concept. > >> > > > >> > > Does it have any common points with your proposal? > >> > > > >> > > Regards, > >> > > Vladimir. > >> > > > >> > > [1] > >> > > > >> > > >> > https://ponymail-vm.apache.org/_GUI_/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E > >> > > > >> > > > >> > > пт, 6 дек. 2019 г. в 05:41, Haisheng Yuan <[email protected]>: > >> > > > >> > >> Oh, I forgot to mention that the join planning/reordering is not a > >> big > >> > >> problem. Calcite just use the rule to generate a single alternative > >> > plan, > >> > >> which is not ideal. But we can't say Calcite is doing wrong. > >> > >> > >> > >> Ideally, we expect it generates multiple (neither all, nor single) > >> > >> bipartie graphs. The join reordering rule will cut each part into > >> > bipartie > >> > >> recursively and apply JoinCommutativity rule to generate more > >> > alternatives > >> > >> for each RelSet. It is just a different strategy. We can modify the > >> > rule, > >> > >> or create new join reordering rule to generate multiple plan > >> > >> alternatives. > >> > >> > >> > >> - Haisheng > >> > >> > >> > >> ------------------------------------------------------------------ > >> > >> 发件人:Haisheng Yuan<[email protected]> > >> > >> 日 期:2019年12月06日 09:07:43 > >> > >> 收件人:Vladimir Ozerov<[email protected]>; [email protected] ( > >> > >> [email protected])<[email protected]> > >> > >> 主 题:Re: Re: Volcano's problem with trait propagation: current state > >> and > >> > >> future > >> > >> > >> > >> Generally agree with what Vladimir said. I think what Calcite has > is > >> > >> logical optimization or exploration, there are almost no physical > >> > >> optimization, Calcite leaves it to third party implementators. One > >> of my > >> > >> friends at University of Wisconsin–Madison database research group > >> told > >> > me > >> > >> that they gave up the idea of using Calcite in their project due to > >> this > >> > >> reason. > >> > >> > >> > >> Currently physical properties are requested in implementation > rules, > >> or > >> > >> even logical exploration rules, But each rule is independent, the > >> > >> pattern-matched expression is not aware of what does the parent > >> operator > >> > >> want. Using AbstractConverter doesn't help, and is not promising. > >> > >> > >> > >> >> You shouldn't regiester all logical rules in the planner > >> > >> simultaneously,... as Drill does. > >> > >> That is because Calcite does too many redundant or duplicate rule > >> > >> matching, like all kinds of transpose (can't be avoided due to > >> current > >> > >> design), matching physical operators. > >> > >> > >> > >> >> decoupling the logical planning from the physical one looks > >> > >> a bit weird to me because it violates the idea of Cascades > framework. > >> > >> Orca optimizer fully adopted the design principle of Cascades > >> framework > >> > >> except that it separates into 3 phases: logical exploration, > physical > >> > >> implementation, and optimization (property enforcing). And it might > >> be > >> > >> easier if we want to enable parallel optimization by seperating > into > >> 3 > >> > >> phases. Orca does branch-and-bound in optimization phase, before > >> actual > >> > >> property derivation and enforcement, IIRC. It is highly efficient, > >> works > >> > >> pretty well, and battlefield-tested by many large financial and > >> > insurance > >> > >> companies. > >> > >> > >> > >> In my last thread about on-demand trait request, I gave the > >> high-level > >> > >> general API for physical operators to derive and require physical > >> > >> properties, which is similar to Orca's design. But seems like the > >> > proposal > >> > >> of API change gets no love. > >> > >> > >> > >> - Haisheng > >> > >> > >> > >> ------------------------------------------------------------------ > >> > >> 发件人:Vladimir Ozerov<[email protected]> > >> > >> 日 期:2019年12月05日 22:22:43 > >> > >> 收件人:[email protected] ([email protected])< > >> > >> [email protected]> > >> > >> 主 题:Re: Volcano's problem with trait propagation: current state and > >> > future > >> > >> > >> > >> AbstractConverter-s are attractive because they effectively emulate > >> > >> straightforward recursive top-down optimization (Volcano/Cascades). > >> But > >> > >> instead of doing it with a recursive method call, which preserves > the > >> > >> context, we do this in Calcite as a sequence of unrelated rule > calls, > >> > thus > >> > >> losing the context. So with my current understanding, it could be > >> > thought > >> > >> of not as a search space explosion, but rather than the inefficient > >> > >> implementation of an otherwise straightforward > parent->child->parent > >> > >> navigation, since we achieve this navigation indirectly through the > >> rule > >> > >> queue, rather than through a normal method call. In any case, the > net > >> > >> result is wasted CPU. Perhaps this is not exponential waste, but > some > >> > >> multiplication of otherwise optimal planning. As I mentioned, in > our > >> > >> experiments with TPC-H, we observed a constant factor between 6-9x > >> > between > >> > >> the number of joins and the number of join implementation rule > >> > >> invocations. > >> > >> It doesn't growth past 9 even for complex queries, so I hope that > >> this > >> > is > >> > >> not an exponent :-) > >> > >> > >> > >> Speaking of logical vs physical optimization, IMO it makes sense in > >> some > >> > >> cases. E.g. when doing predicate pushdown, you do not want to > >> consider > >> > >> intermediate logical tree states for implementation, until the > >> predicate > >> > >> reaches its final position. So running separate logical planning > >> phase > >> > >> with > >> > >> Volcano optimizer makes total sense to me, because it effectively > >> > prunes a > >> > >> lot of not optimal logical plans before they reach the physical > >> planning > >> > >> stage. The real problem to me is that we forced to remove join > >> planning > >> > >> from the physical optimization stage. Because the goal of join > >> planning > >> > >> not > >> > >> to generate a single optimal plan, like with predicate pushdown, > but > >> > >> rather > >> > >> to generate a set of logical plans all of which should be > implemented > >> > and > >> > >> estimated. And with AbstractConverter-s this is not possible > because > >> of > >> > >> their multiplicator increases the rate of search space growth, > making > >> > join > >> > >> planning inapplicable even for the small number of relations. So we > >> have > >> > >> to > >> > >> move them to the logical planning stage and pick only one > permutation > >> > for > >> > >> physical planning. > >> > >> > >> > >> > >> > >> чт, 5 дек. 2019 г. в 15:35, Roman Kondakov > >> <[email protected] > >> > >: > >> > >> > >> > >> > Vladimir, > >> > >> > > >> > >> > thank you for bringing it up. We are facing the same problems in > >> > Apache > >> > >> > Ignite project > >> > >> > and it would be great if Apache Calcite community will propose a > >> > >> > solution for this > >> > >> > issue. > >> > >> > > >> > >> > From my point of view an approach with abstract converters looks > >> more > >> > >> > promising, but as > >> > >> > you mentioned it suffers from polluting the search space. The > >> latter > >> > can > >> > >> > be mitigated by > >> > >> > splitting a planning stage into the several phases: you shouldn't > >> > >> > register all logical rules in the planner simultaneously - it > looks > >> > like > >> > >> > it is better to have several iterations of planning stage with > >> > different > >> > >> > sets of rules, as Drill does. > >> > >> > > >> > >> > Also I'd like to mention that decoupling the logical planning > from > >> the > >> > >> > physical one looks > >> > >> > a bit weird to me because it violates the idea of Cascades > >> framework. > >> > >> > Possibly this decoupling is the consequence of some performance > >> > issues. > >> > >> > > >> > >> > > >> > >> > -- > >> > >> > Kind Regards > >> > >> > Roman Kondakov > >> > >> > > >> > >> > On 05.12.2019 14:24, Vladimir Ozerov wrote: > >> > >> > > Hi, > >> > >> > > > >> > >> > > As I mentioned before, we are building a distributed SQL engine > >> that > >> > >> uses > >> > >> > > Apache Calcite for query optimization. The key problem we faced > >> is > >> > the > >> > >> > > inability to pull the physical traits of child relations > >> > efficiently. > >> > >> I'd > >> > >> > > like to outline my understanding of the problem (I guess it was > >> > >> already > >> > >> > > discussed multiple times) and ask the community to prove or > >> disprove > >> > >> the > >> > >> > > existence of that problem and its severity for the products > which > >> > uses > >> > >> > > Apache Calcite and ask for ideas on how it could be improved in > >> the > >> > >> > future. > >> > >> > > > >> > >> > > I'll start with the simplified problem description and > mentioned > >> > more > >> > >> > > complex use cases then. Consider that we have a logical tree > and > >> a > >> > >> set of > >> > >> > > implementation rules. Our goal is to find the optimal physical > >> tree > >> > by > >> > >> > > applying these rules. The classical Cascades-based approach > >> directs > >> > >> the > >> > >> > > optimization process from the top to the bottom (hence > >> "top-down"). > >> > >> > > However, the actual implementation of tree nodes still happens > >> > >> bottom-up. > >> > >> > > For the tree L1 <- L2, we enter "optimize(L1)", which > recursively > >> > >> > delegates > >> > >> > > to "optimize(L2)". We then implement children nodes L1 <- [P2', > >> > P2''], > >> > >> > and > >> > >> > > return back to the parent, which is now able to pick promising > >> > >> > > implementations of the children nodes and reject bad ones with > >> the > >> > >> > > branch-and-bound approach. AFAIK Pivotal's Orca works this way. > >> > >> > > > >> > >> > > The Apache Calcite is very different because it doesn't allow > the > >> > >> > recursion > >> > >> > > so that we lose the context on which node requested the child > >> > >> > > transformation. This loss of context leads to the following > >> > problems: > >> > >> > > 1) The parent node cannot deduce it's physical properties > during > >> the > >> > >> > > execution of the implementation rule, because Calcite expects > the > >> > >> > > transformation to be applied before children nodes are > >> implemented. > >> > >> That > >> > >> > is > >> > >> > > if we are optimizing LogicalProject <- LogicalScan, we cannot > set > >> > >> proper > >> > >> > > distribution and collation for the to be created > PhysicalProject, > >> > >> because > >> > >> > > it depends on the distribution and collation of the children > >> which > >> > is > >> > >> yet > >> > >> > > to be resolved. > >> > >> > > 2) The branch-and-bound cannot be used because it requires at > >> least > >> > >> one > >> > >> > > fully-built physical subtree. > >> > >> > > > >> > >> > > As a result of this limitation, products which rely on Apache > >> > Calcite > >> > >> for > >> > >> > > query optimization, use one or several workarounds: > >> > >> > > > >> > >> > > *1) Guess the physical properties of parent nodes before > logical > >> > >> children > >> > >> > > are implemented* > >> > >> > > *Apache Flink* uses this strategy. The strategy is bad because > of > >> > the > >> > >> > > number of combinations of traits growth exponentially with the > >> > number > >> > >> of > >> > >> > > attributes in the given RelNode, so you either explode the > search > >> > >> space > >> > >> > or > >> > >> > > give up optimization opportunities. Consider the following > tree: > >> > >> > > LogicalSort[a ASC] <- LogicalFilter <- LogicalScan > >> > >> > > The optimal implementation of the LogicalFilter is > >> > >> > PhysicalFilter[collation=a > >> > >> > > ASC] because it may eliminate the parent sort. But such > >> optimization > >> > >> > should > >> > >> > > happen only if we know that there is a physical implementation > of > >> > scan > >> > >> > > allowing for this sort order, e.g. > PhysicalIndexScan[collation=a > >> > ASC]. > >> > >> > I.e. > >> > >> > > we need to know the child physical properties first. Otherwise > we > >> > >> > fallback > >> > >> > > to speculative approaches. With the *optimistic* approach, we > >> emit > >> > all > >> > >> > > possible combinations of physical properties, with the hope > that > >> the > >> > >> > child > >> > >> > > will satisfy some of them, thus expanding the search space > >> > >> exponentially. > >> > >> > > With the *pessimistic* approach, we just miss this optimization > >> > >> > opportunity > >> > >> > > even if the index exists. Apache Flink uses the pessimistic > >> > approach. > >> > >> > > > >> > >> > > *2) Use AbstractConverters* > >> > >> > > *Apache Drill* uses this strategy. The idea is to "glue" > logical > >> and > >> > >> > > physical operators, so that implementation of a physical child > >> > >> > re-triggers > >> > >> > > implementation rule of a logical parent. The flow is as > follows: > >> > >> > > - Invoke parent implementation rule - it either doesn't produce > >> new > >> > >> > > physical nodes or produce not optimized physical nodes (like in > >> the > >> > >> > Apache > >> > >> > > Flink case) > >> > >> > > - Invoke children implementation rules and create physical > >> children > >> > >> > > - Then converters kick-in and re-trigger parent implementation > >> rule > >> > >> > through > >> > >> > > the creation of an abstract converter > >> > >> > > - Finally, the parent implementation rule is fired again and > now > >> it > >> > >> > > produces optimized node(s) since at least some of the physical > >> > >> > > distributions of children nodes are implemented. > >> > >> > > > >> > >> > > Note that this is essentially a hack to simulate the Cascades > >> flow! > >> > >> The > >> > >> > > problem is that AbstractConverters increase the complexity of > >> > planning > >> > >> > > because they do not have any context, so parent rules are just > >> > >> > re-triggered > >> > >> > > blindly. Consider the optimization of the following tree: > >> > >> > > LogicalJoin <- [LogicalScan1, LogicalScan2] > >> > >> > > With the converter approach, the join implementation rule will > be > >> > >> fired > >> > >> > at > >> > >> > > least 3 times, while in reality, one call should be sufficient. > >> In > >> > our > >> > >> > > experiments with TPC-H queries, the join rule implemented that > >> way > >> > is > >> > >> > > typically called 6-9 times more often than expected. > >> > >> > > > >> > >> > > *3) Transformations (i.e. logical optimization) are decoupled > >> from > >> > >> > > implementation (i.e. physical optimization)* > >> > >> > > Normally, you would like to mix both logical and physical rules > >> in a > >> > >> > single > >> > >> > > optimization program, because it is required for proper > planning. > >> > That > >> > >> > is, > >> > >> > > you should consider both (Ax(BxC)) and ((AxB)xC) join ordering > >> > during > >> > >> > > physical optimization, because you do not know which one will > >> > produce > >> > >> the > >> > >> > > better plan in advance. > >> > >> > > But in some practical implementations of Calcite-based > >> optimizers, > >> > >> this > >> > >> > is > >> > >> > > not the case, and join planning is performed as a separate HEP > >> > stage. > >> > >> > > Examples are Apache Drill and Apache Flink. > >> > >> > > I believe that lack of Cascades-style flow and branch-and-bound > >> are > >> > >> among > >> > >> > > the main reasons for this. At the very least for Apache Drill, > >> since > >> > >> it > >> > >> > > uses converters, so additional logical permutations will > >> > exponentially > >> > >> > > multiply the number of fired rules, which is already very big. > >> > >> > > > >> > >> > > Given all these problems I'd like to ask the community to share > >> > >> current > >> > >> > > thoughts and ideas on the future improvement of the Calcite > >> > optimizer. > >> > >> > One > >> > >> > > of the ideas being discussed in the community is "Pull-up > >> Traits", > >> > >> which > >> > >> > > should allow the parent node to get physical properties of the > >> > >> children > >> > >> > > nodes. But in order to do this, you effectively need to > implement > >> > >> > children, > >> > >> > > which IMO makes this process indistinguishable from the > classical > >> > >> > recursive > >> > >> > > Cascades algorithm. > >> > >> > > > >> > >> > > Have you considered recursive transformations as an alternative > >> > >> solution > >> > >> > to > >> > >> > > that problem? I.e. instead of trying to guess or pull the > >> physical > >> > >> > > properties of non-existent physical nodes, go ahead and > actually > >> > >> > implement > >> > >> > > them directly from within the parent rule? This may resolve the > >> > >> problem > >> > >> > > with trait pull-up, as well as allow for branch-and-bound > >> > >> optimization. > >> > >> > > > >> > >> > > I would appreciate your feedback or any hints for future > >> research. > >> > >> > > > >> > >> > > Regards, > >> > >> > > Vladimir. > >> > >> > > > >> > >> > > >> > >> > >> > >> > >> > >> > >> > > >> > > >
