Re: Set of k-cheapest plans

Julian Hyde Thu, 19 Mar 2020 13:22:17 -0700

I have long wanted to implement "Multi-Objective Parametric Query Optimization” 
[1] in Calcite. Though it would be a huge amount of work.


What is elegant about that approach is that you can see the rate of change of 
the cost as you change any one of the parameters (e.g. the amount of memory 
available), and you can find what are the precise values of the parameters at 
which the cheapest plan changes (e.g. as memory becomes available, perhaps a 
hash-join is preferred over merge-join). 

Measuring the rate of change of outcome as you change each of the inputs is 
tied (in my mind at least) with the notion of robustness.

Julian

[1] http://www.vldb.org/pvldb/vol8/p221-trummer.pdf 
<http://www.vldb.org/pvldb/vol8/p221-trummer.pdf>

> On Mar 19, 2020, at 12:53 PM, Rui Souto <[email protected]> wrote:
> 
> I should have provided some context in the original post. The idea is to
> implement a similar strategy described in *Robustness Metrics for
> Relational Query Execution Plans (2018) F Wolf et. Al* where they mention
> the following
> 
> *PLAN CANDIDATES AND SELECTION *
> 
> 
> 
> Our novel robust plan selection strategy has three phases:
>> 
>> First, we enumerate the set of robust plan candidates. Every robust plan
>> candidate is a plan for the entire query, and not a sub-plan. Second, we
>> calculate the robustness value for each robust plan candidate by applying
>> one of the three robustness metrics. Third, we select the estimated most
>> robust plan, i.e., the robust plan candidate with the smallest robustness
>> value for execution. Apart from robustness, selecting a cheap query
>> execution plan is still a major optimization goal. Consequently, our first
>> criteria for the robust plan candidates is that they have to be the
>> k-cheapest plans:
>> 
> 
> 
> *Definition 13. *The k-cheapest plans are the k query execution plans with
>> the smallest estimated cost.
> 
> 
> 
> The k-cheapest plans significantly reduce the number of plan candidates,
>> and give a tight upper bound for the number of plans independent of the
>> plan space. In addition, the k-cheapest plans can be utilised to apply
>> additional constraints, such as memory consumption, on the plan set.
>> Section 6 shows that k 􀀀 500 has a low optimization overhead. Furthermore,
>> we show that the estimated robust plan inside k = 500 is competitive w.r.t.
>> an estimated robust plan with larger k. Enumerating the k-cheapest plans is
>> just a small modification in the optimizer. The trivial approach in a
>> dynamic programming enumerator is to keep the k-cheapest plans in each plan
>> class, instead of the cheapest plan. The k-cheapest plans of two plan
>> classes can be combined to create plans of another plan class. We show in
>> Section 6 that enumerating the k-cheapest plans is a reasonable overhead.
> 
> 
> 
> Since the k-cheapest plans can contain expensive plans for small queries,
>> and only the cardinality-integral robustness metric takes plan cost into
>> consideration, we further limit the robust plan candidates for the
>> cardinality-slope and the selectivity-slope robustness metric to
>> near-optimal plans:
> 
> 
> 
> *Definition 14.* The near-optimal plans are a sub-set of the plan space,
>> containing the query execution plans with estimated cost at most -times
>> larger than the estimated cost of the estimated optimal plan.
> 
> 
> 
> (...)
>> 
> 
> 
> In sum, our plan selection strategy has very low risks:
> 
> First, we enumerate the k-cheapest plans. Second, we calculate the
>> robustness value for each robust plan candidate. Though it is a reasonable
>> overhead, it can be significant in very short running queries. It is not
>> significant in our real world experiments in Section 6.1. In addition,
>> dynamic programming enumeration is no limitation, but shows that our approach
>> can be integrated into enterprise class optimizers.
> 
> 
> Vladimir Ozerov <[email protected]> escreveu no dia quinta, 19/03/2020
> à(s) 19:30:
> 
>> Hi Liya,
>> 
>> This approach may work. However, as Julian mentioned earlier, the main
>> question is why do you need these plans? If you collect the plans in the
>> way you described, you may get several best plans which are only marginally
>> different from each other, which are of little use.
>> 
>> For example, there is one real scenario where several plans might be needed
>> - join planning. Several Calcite-based engines perform join planning as a
>> separate phase because otherwise planning might take too much time. And
>> this phase produces only one RelNode. What we would like to have here
>> instead, is a set of best plans for different join orders, e.g. "A join B"
>> and "B join A". But in the proposed algorithm, you may easily get two
>> flavors of "A join B", which is not what we need.
>> 
>> So perhaps what we need instead, is the ability to switch off certain
>> RelNode-s or even RelSubset-s from the plan during the best-exp
>> calculation?
>> 
>> Regards,
>> Vladimir.
>> 
>> 
>> чт, 19 мар. 2020 г. в 05:03, Fan Liya <[email protected]>:
>> 
>>> IMO, there is no easy way, and the algorithm should depends on definition
>>> of 'alternative plans'.
>>> 
>>> In general, the algorithm can proceed like this:
>>> 1. we use the volcano algoirthm to find the best plan
>>> 2. we make the self cost of some node to be infinite, and then apply the
>>> volcano algorithm again to find the second best plan
>>> 3. we repeat the above steps, until we find the k best plans.
>>> 
>>> Best,
>>> Liya Fan
>>> 
>>> On Thu, Mar 19, 2020 at 3:20 AM Julian Hyde <[email protected]> wrote:
>>> 
>>>> There’s no easy way. You could modify ‘buildCheapestPlan()’ to build
>> all
>>>> plans below a cost limit. (You’d have to carefully choose a cost limit
>>> only
>>>> a little above the optimal cost, otherwise you’ll get huge numbers of
>>>> plans.)
>>>> 
>>>> I fear that you’ll get plans that are different in only trivial or
>> minor
>>>> respects (e.g. ordering of items in a Project) whereas you probably
>> want
>>>> plans with significant differences (e.g. different join orders).
>>>> 
>>>> Julian
>>>> 
>>>>> On Mar 18, 2020, at 12:01 PM, Rui Souto <[email protected]> wrote:
>>>>> 
>>>>> Hi there!
>>>>> 
>>>>> Just recently started to learn about Apache Calcite. What's the best
>>> way
>>>> to
>>>>> get a list of the k-cheapest alternative plans generated by the
>>> optimizer
>>>>> for a given query? (being k an arbitrary number)
>>>> 
>>>> 
>>> 
>>

Re: Set of k-cheapest plans

Reply via email to