Re: Set of k-cheapest plans

Rui Souto Thu, 19 Mar 2020 12:54:17 -0700

I should have provided some context in the original post. The idea is to
implement a similar strategy described in *Robustness Metrics for
Relational Query Execution Plans (2018) F Wolf et. Al* where they mention
the following


*PLAN CANDIDATES AND SELECTION *



Our novel robust plan selection strategy has three phases:
>
> First, we enumerate the set of robust plan candidates. Every robust plan
> candidate is a plan for the entire query, and not a sub-plan. Second, we
> calculate the robustness value for each robust plan candidate by applying
> one of the three robustness metrics. Third, we select the estimated most
> robust plan, i.e., the robust plan candidate with the smallest robustness
> value for execution. Apart from robustness, selecting a cheap query
> execution plan is still a major optimization goal. Consequently, our first
> criteria for the robust plan candidates is that they have to be the
> k-cheapest plans:
>


*Definition 13. *The k-cheapest plans are the k query execution plans with
> the smallest estimated cost.



The k-cheapest plans significantly reduce the number of plan candidates,
> and give a tight upper bound for the number of plans independent of the
> plan space. In addition, the k-cheapest plans can be utilised to apply
> additional constraints, such as memory consumption, on the plan set.
> Section 6 shows that k 􀀀 500 has a low optimization overhead. Furthermore,
> we show that the estimated robust plan inside k = 500 is competitive w.r.t.
> an estimated robust plan with larger k. Enumerating the k-cheapest plans is
> just a small modification in the optimizer. The trivial approach in a
> dynamic programming enumerator is to keep the k-cheapest plans in each plan
> class, instead of the cheapest plan. The k-cheapest plans of two plan
> classes can be combined to create plans of another plan class. We show in
> Section 6 that enumerating the k-cheapest plans is a reasonable overhead.



Since the k-cheapest plans can contain expensive plans for small queries,
> and only the cardinality-integral robustness metric takes plan cost into
> consideration, we further limit the robust plan candidates for the
> cardinality-slope and the selectivity-slope robustness metric to
> near-optimal plans:



*Definition 14.* The near-optimal plans are a sub-set of the plan space,
> containing the query execution plans with estimated cost at most -times
> larger than the estimated cost of the estimated optimal plan.



(...)
>


In sum, our plan selection strategy has very low risks:

First, we enumerate the k-cheapest plans. Second, we calculate the
> robustness value for each robust plan candidate. Though it is a reasonable
> overhead, it can be significant in very short running queries. It is not
> significant in our real world experiments in Section 6.1. In addition,
> dynamic programming enumeration is no limitation, but shows that our approach
> can be integrated into enterprise class optimizers.


Vladimir Ozerov <[email protected]> escreveu no dia quinta, 19/03/2020
à(s) 19:30:

> Hi Liya,
>
> This approach may work. However, as Julian mentioned earlier, the main
> question is why do you need these plans? If you collect the plans in the
> way you described, you may get several best plans which are only marginally
> different from each other, which are of little use.
>
> For example, there is one real scenario where several plans might be needed
> - join planning. Several Calcite-based engines perform join planning as a
> separate phase because otherwise planning might take too much time. And
> this phase produces only one RelNode. What we would like to have here
> instead, is a set of best plans for different join orders, e.g. "A join B"
> and "B join A". But in the proposed algorithm, you may easily get two
> flavors of "A join B", which is not what we need.
>
> So perhaps what we need instead, is the ability to switch off certain
> RelNode-s or even RelSubset-s from the plan during the best-exp
> calculation?
>
> Regards,
> Vladimir.
>
>
> чт, 19 мар. 2020 г. в 05:03, Fan Liya <[email protected]>:
>
> > IMO, there is no easy way, and the algorithm should depends on definition
> > of 'alternative plans'.
> >
> > In general, the algorithm can proceed like this:
> > 1. we use the volcano algoirthm to find the best plan
> > 2. we make the self cost of some node to be infinite, and then apply the
> > volcano algorithm again to find the second best plan
> > 3. we repeat the above steps, until we find the k best plans.
> >
> > Best,
> > Liya Fan
> >
> > On Thu, Mar 19, 2020 at 3:20 AM Julian Hyde <[email protected]> wrote:
> >
> > > There’s no easy way. You could modify ‘buildCheapestPlan()’ to build
> all
> > > plans below a cost limit. (You’d have to carefully choose a cost limit
> > only
> > > a little above the optimal cost, otherwise you’ll get huge numbers of
> > > plans.)
> > >
> > > I fear that you’ll get plans that are different in only trivial or
> minor
> > > respects (e.g. ordering of items in a Project) whereas you probably
> want
> > > plans with significant differences (e.g. different join orders).
> > >
> > > Julian
> > >
> > > > On Mar 18, 2020, at 12:01 PM, Rui Souto <[email protected]> wrote:
> > > >
> > > > Hi there!
> > > >
> > > > Just recently started to learn about Apache Calcite. What's the best
> > way
> > > to
> > > > get a list of the k-cheapest alternative plans generated by the
> > optimizer
> > > > for a given query? (being k an arbitrary number)
> > >
> > >
> >
>

Re: Set of k-cheapest plans

Reply via email to