I should have provided some context in the original post. The idea is to implement a similar strategy described in *Robustness Metrics for Relational Query Execution Plans (2018) F Wolf et. Al* where they mention the following
*PLAN CANDIDATES AND SELECTION * Our novel robust plan selection strategy has three phases: > > First, we enumerate the set of robust plan candidates. Every robust plan > candidate is a plan for the entire query, and not a sub-plan. Second, we > calculate the robustness value for each robust plan candidate by applying > one of the three robustness metrics. Third, we select the estimated most > robust plan, i.e., the robust plan candidate with the smallest robustness > value for execution. Apart from robustness, selecting a cheap query > execution plan is still a major optimization goal. Consequently, our first > criteria for the robust plan candidates is that they have to be the > k-cheapest plans: > *Definition 13. *The k-cheapest plans are the k query execution plans with > the smallest estimated cost. The k-cheapest plans significantly reduce the number of plan candidates, > and give a tight upper bound for the number of plans independent of the > plan space. In addition, the k-cheapest plans can be utilised to apply > additional constraints, such as memory consumption, on the plan set. > Section 6 shows that k 500 has a low optimization overhead. Furthermore, > we show that the estimated robust plan inside k = 500 is competitive w.r.t. > an estimated robust plan with larger k. Enumerating the k-cheapest plans is > just a small modification in the optimizer. The trivial approach in a > dynamic programming enumerator is to keep the k-cheapest plans in each plan > class, instead of the cheapest plan. The k-cheapest plans of two plan > classes can be combined to create plans of another plan class. We show in > Section 6 that enumerating the k-cheapest plans is a reasonable overhead. Since the k-cheapest plans can contain expensive plans for small queries, > and only the cardinality-integral robustness metric takes plan cost into > consideration, we further limit the robust plan candidates for the > cardinality-slope and the selectivity-slope robustness metric to > near-optimal plans: *Definition 14.* The near-optimal plans are a sub-set of the plan space, > containing the query execution plans with estimated cost at most -times > larger than the estimated cost of the estimated optimal plan. (...) > In sum, our plan selection strategy has very low risks: First, we enumerate the k-cheapest plans. Second, we calculate the > robustness value for each robust plan candidate. Though it is a reasonable > overhead, it can be significant in very short running queries. It is not > significant in our real world experiments in Section 6.1. In addition, > dynamic programming enumeration is no limitation, but shows that our approach > can be integrated into enterprise class optimizers. Vladimir Ozerov <[email protected]> escreveu no dia quinta, 19/03/2020 à(s) 19:30: > Hi Liya, > > This approach may work. However, as Julian mentioned earlier, the main > question is why do you need these plans? If you collect the plans in the > way you described, you may get several best plans which are only marginally > different from each other, which are of little use. > > For example, there is one real scenario where several plans might be needed > - join planning. Several Calcite-based engines perform join planning as a > separate phase because otherwise planning might take too much time. And > this phase produces only one RelNode. What we would like to have here > instead, is a set of best plans for different join orders, e.g. "A join B" > and "B join A". But in the proposed algorithm, you may easily get two > flavors of "A join B", which is not what we need. > > So perhaps what we need instead, is the ability to switch off certain > RelNode-s or even RelSubset-s from the plan during the best-exp > calculation? > > Regards, > Vladimir. > > > чт, 19 мар. 2020 г. в 05:03, Fan Liya <[email protected]>: > > > IMO, there is no easy way, and the algorithm should depends on definition > > of 'alternative plans'. > > > > In general, the algorithm can proceed like this: > > 1. we use the volcano algoirthm to find the best plan > > 2. we make the self cost of some node to be infinite, and then apply the > > volcano algorithm again to find the second best plan > > 3. we repeat the above steps, until we find the k best plans. > > > > Best, > > Liya Fan > > > > On Thu, Mar 19, 2020 at 3:20 AM Julian Hyde <[email protected]> wrote: > > > > > There’s no easy way. You could modify ‘buildCheapestPlan()’ to build > all > > > plans below a cost limit. (You’d have to carefully choose a cost limit > > only > > > a little above the optimal cost, otherwise you’ll get huge numbers of > > > plans.) > > > > > > I fear that you’ll get plans that are different in only trivial or > minor > > > respects (e.g. ordering of items in a Project) whereas you probably > want > > > plans with significant differences (e.g. different join orders). > > > > > > Julian > > > > > > > On Mar 18, 2020, at 12:01 PM, Rui Souto <[email protected]> wrote: > > > > > > > > Hi there! > > > > > > > > Just recently started to learn about Apache Calcite. What's the best > > way > > > to > > > > get a list of the k-cheapest alternative plans generated by the > > optimizer > > > > for a given query? (being k an arbitrary number) > > > > > > > > >
