Re: [DISCUSS] CEP-39: Cost Based Optimizer

Benjamin Lerer Thu, 14 Dec 2023 08:28:27 -0800

>
>   So yes, this physical plan is the structure that you have in mind but
> the idea of sharing it is not part of the CEP.



Sorry, Benedict, what I meant by sharing was sharing across the nodes. It
is an integral part of the optimizer API that the CEP talks about as it
represents its output. I did not realize that this part was not clear.

Le jeu. 14 déc. 2023 à 17:15, Benedict <bened...@apache.org> a écrit :

> Fwiw Chris, I agree with your concerns, but I think the introduction of a
> CBO - done right - is in principle a good thing in its own right. It’s
> independent of the issues you mention, even if it might enable features
> that exacerbate them.
>
> It should also help enable secondary indexes work better, which is I think
> their main justification. Otherwise there isn’t really a good reason for it
> today. Though I personally anticipate ongoing issues around 2i that SAI is
> perhaps over exuberantly sold as solving, and that a CBO will not fix. But
> we’ll see how that evolves.
>
>
>
> On 14 Dec 2023, at 15:49, Chris Lohfink <clohfin...@gmail.com> wrote:
>
> 
> I don't wanna be a blocker for this CEP or anything but did want to put my
> 2 cents in. This CEP is horrifying to me.
>
> I have seen thousands of clusters across multiple companies and helped
> them get working successfully. A vast majority of that involved blocking
> the use of MVs, GROUP BY, secondary indexes, and even just simple _range
> queries_. The "unncessary restrictions of cql" are not only necessary IMHO,
> more restrictions are necessary to be successful at scale. The idea of just
> opening up CQL to general purpose relational queries and lines like 
> "supporting
> queries with joins in an efficient way" ... I would really like us to
> make secondary indexes be a viable option before we start opening up
> floodgates on stuff like this.
>
> Chris
>
> On Thu, Dec 14, 2023 at 9:37 AM Benedict <bened...@apache.org> wrote:
>
>> > So yes, this physical plan is the structure that you have in mind but
>> the idea of sharing it is not part of the CEP.
>>
>>
>> I think it should be. This should form a major part of the API on which
>> any CBO is built.
>>
>>
>> > It seems that there is a difference between the goal of your proposal
>> and the one of the CEP. The goal of the CEP is first to ensure optimal
>> performance. It is ok to change the execution plan for one that delivers
>> better performance. What we want to minimize is having a node performing
>> queries in an inefficient way for a long period of time.
>>
>>
>> You have made a goal of the CEP synchronising summary statistics across
>> the whole cluster in order to achieve some degree of uniformity of query
>> plan. So this is explicitly a goal of the CEP, and synchronising summary
>> statistics is a hard problem and won’t provide strong guarantees.
>>
>>
>> > The client side proposal targets consistency for a given query on a
>> given driver instance. In practice, it would be possible to have 2 similar
>> queries with 2 different execution plans on the same driver
>>
>>
>> This would only be possible if the driver permitted it. A driver could
>> (and should) enforce that it only permits one query plan per query.
>>
>>
>> The opposite is true for your proposal: some queries may begin degrading
>> because they touch specific replicas that optimise the query differently,
>> and this will be hard to debug.
>>
>>
>> On 14 Dec 2023, at 15:30, Benjamin Lerer <b.le...@gmail.com> wrote:
>>
>> 
>> The binding of the parser output to the schema (what is today the
>> Raw.prepare call) will create the logical plan, expressed as a tree of
>> relational operators. Simplification and normalization will happen on that
>> tree to produce a new equivalent logical plan. That logical plan will be
>> used as input to the optimizer. The output will be a physical plan
>> producing the output specified by the logical plan. A tree of physical
>> operators specifying how the operations should be performed.
>>
>> That physical plan will be stored as part of the statements
>> (SelectStatement, ModificationStatement, ...) in the prepared statement
>> cache. Upon execution, variables will be bound and the
>> RangeCommands/Mutations will be created based on the physical plan.
>>
>> The string representation of a physical plan will effectively represent
>> the output of an EXPLAIN statement but outside of that the physical plan
>> will stay encapsulated within the statement classes.
>> Hints will be parameters provided to the optimizer to enforce some
>> specific choices. Like always using an Index Scan instead of a Table Scan,
>> ignoring the cost comparison.
>>
>> So yes, this physical plan is the structure that you have in mind but the
>> idea of sharing it is not part of the CEP. I did not document it because it
>> will simply be a tree of physical operators used internally.
>>
>> My proposal is that the execution plan of the coordinator that prepares a
>>> query gets serialised to the client, which then provides the execution plan
>>> to all future coordinators, and coordinators provide it to replicas as
>>> necessary.
>>>
>>>
>>> This means it is not possible for any conflict to arise for a single
>>> client. It would guarantee consistency of execution for any single client
>>> (and avoid any drift over the client’s sessions), without necessarily
>>> guaranteeing consistency for all clients.
>>>
>>
>>  It seems that there is a difference between the goal of your proposal
>> and the one of the CEP. The goal of the CEP is first to ensure optimal
>> performance. It is ok to change the execution plan for one that delivers
>> better performance. What we want to minimize is having a node performing
>> queries in an inefficient way for a long period of time.
>>
>> The client side proposal targets consistency for a given query on a given
>> driver instance. In practice, it would be possible to have 2 similar
>> queries with 2 different execution plans on the same driver making things
>> really confusing. Identifying the source of an inefficient query will also
>> be pretty hard.
>>
>> Interestingly, having 2 nodes with 2 different execution plans might not
>> be a serious problem. It simply means that based on cardinality at t1, the
>> optimizer on node 1 chose plan 1 while the one on node 2 chose plan 2 at
>> t2. In practice if the cost estimates reflect properly the actual cost
>> those 2 plans should have pretty similar efficiency. The problem is more
>> about the fact that you would ideally want a uniform behavior around your
>> cluster.
>> Changes of execution plans should only occur at certain points. So the
>> main problematic scenario is when the data distribution is around one of
>> those points. Which is also the point where the change should have the
>> least impact.
>>
>>
>> Le jeu. 14 déc. 2023 à 11:38, Benedict <bened...@apache.org> a écrit :
>>
>>> There surely needs to be a more succinct and abstract representation in
>>> order to perform transformations on the query plan? You don’t intend to
>>> manipulate the object graph directly as you apply any transformations when
>>> performing simplification or cost based analysis? This would also (I
>>> expect) be the form used to support EXPLAIN functionality, and probably
>>> also HINTs etc. This would ideally *not* be coupled to the CBO itself,
>>> and would ideally be succinctly serialised.
>>>
>>> I would very much expect the query plan to be represented abstractly as
>>> part of this work, and for there to be a mechanism that translates this
>>> abstract representation into the object graph that executes it.
>>>
>>> If I’m incorrect, could you please elaborate more specifically how you
>>> intend to go about this?
>>>
>>> On 14 Dec 2023, at 10:33, Benjamin Lerer <b.le...@gmail.com> wrote:
>>>
>>> 
>>>
>>>> I mean that an important part of this work - not specified in the CEP
>>>> (AFAICT) - should probably be to define some standard execution model, that
>>>> we can manipulate and serialise, for use across (and without) optimisers.
>>>
>>>
>>> I am confused because for me an execution model defines how operations
>>> are executed within the database in a conceptual way, which is not
>>> something that this CEP intends to change. Do you mean the
>>> physical/execution plan?
>>> Today this plan is somehow represented for reads by the SelectStatement
>>> and its components (Selections, StatementRestrictions, ...) it is then
>>> converted at execution time after parameter binding into a ReadCommand
>>> which is sent to the replicas.
>>> We plan to refactor SelectStatement and its components but the
>>> ReadCommands change should be relatively small. What you are proposing is
>>> not part of the scope of this CEP.
>>>
>>> Le jeu. 14 déc. 2023 à 10:24, Benjamin Lerer <b.le...@gmail.com> a
>>> écrit :
>>>
>>>> Can you share the reasons why Apache Calcite is not suitable for this
>>>>> case and why it was rejected
>>>>
>>>>
>>>> My understanding is that Calcite was made for two main things: to help
>>>> with optimizing SQL-like languages and to let people query different kinds
>>>> of data sources together.
>>>>
>>>> We could think about using it for our needs, but there are some big
>>>> problems:
>>>>
>>>>    1.
>>>>
>>>>    CQL is not SQL. There are significant differences between the 2
>>>>    languages
>>>>    2.
>>>>
>>>>    Cassandra has its own specificities that will influence the cost
>>>>    model and the way we deal with optimizations: partitions, replication
>>>>    factors, consistency levels, LSM tree storage, ...
>>>>    3.
>>>>
>>>>    Every framework comes with its own limitations and additional cost
>>>>
>>>> From my view, there are too many big differences between what Calcite
>>>> does and what we need in Cassandra. If we used Calcite, it would also mean
>>>> relying a lot on another system that everyone would have to learn and
>>>> adjust to. The problems and extra work this would bring don't seem worth
>>>> the benefits we might get
>>>>
>>>>
>>>> Le mer. 13 déc. 2023 à 18:06, Benjamin Lerer <b.le...@gmail.com> a
>>>> écrit :
>>>>
>>>>> One thing that I did not mention is the fact that this CEP is only a
>>>>> high level proposal. There will be deeper discussions on the dev list
>>>>> around the different parts of this proposal when we reach those parts and
>>>>> have enough details to make those discussions more meaningful.
>>>>>
>>>>>
>>>>>> The maintenance and distribution of summary statistics in particular
>>>>>> is worthy of its own CEP, and it might be preferable to split it out.
>>>>>
>>>>>
>>>>> For maintaining node statistics the idea is to re-use the current
>>>>> Memtable/SSTable mechanism and relies on mergeable statistics. That will
>>>>> allow us to easily build node level statistics for a given table by 
>>>>> merging
>>>>> all the statistics of its memtable and SSTables. For the distribution of
>>>>> these node statistics we are still exploring different options. We can 
>>>>> come
>>>>> back with a precise proposal once we have hammered all the details.
>>>>> Is it for you a blocker for this CEP or do you just want to make sure
>>>>> that this part is discussed in deeper details before we implement it?
>>>>>
>>>>>>
>>>>>> The proposal also seems to imply we are aiming for coordinators to
>>>>>> all make the same decision for a query, which I think is challenging, and
>>>>>> it would be worth fleshing out the design here a little (perhaps just in
>>>>>> Jira).
>>>>>
>>>>>
>>>>> The goal is that the large majority of nodes preparing a query at a
>>>>> given point in time should make the same decision and that over time all
>>>>> nodes should converge toward the same decision. This part is dependent on
>>>>> the node statistics distribution, the cost model and the triggers for
>>>>> re-optimization (that will require some experimentation).
>>>>>
>>>>> There’s also not much discussion of the execution model: I think it
>>>>>> would make most sense for this to be independent of any cost and 
>>>>>> optimiser
>>>>>> models (though they might want to operate on them), so that EXPLAIN and
>>>>>> hints can work across optimisers (a suitable hint might essentially 
>>>>>> bypass
>>>>>> the optimiser, if the optimiser permits it, by providing a standard
>>>>>> execution model)
>>>>>>
>>>>>
>>>>> It is not clear to me what you mean by "a standard execution model"?
>>>>> Otherwise, we were not planning to have the execution model or the hints
>>>>> depending on the optimizer.
>>>>>
>>>>> I think it would be worth considering providing the execution plan to
>>>>>> the client as part of query preparation, as an opaque payload to supply 
>>>>>> to
>>>>>> coordinators on first contact, as this might simplify the problem of
>>>>>> ensuring queries behave the same without adopting a lot of complexity for
>>>>>> synchronising statistics (which will never provide strong guarantees). Of
>>>>>> course, re-preparing a query might lead to a new plan, though any
>>>>>> coordinators with the query in their cache should be able to retrieve it
>>>>>> cheaply. If the execution model is efficiently serialised this might have
>>>>>> the ancillary benefit of improving the occupancy of our prepared query
>>>>>> cache.
>>>>>>
>>>>>
>>>>> I am not sure that I understand your proposal. If 2 nodes build a
>>>>> different execution plan how do you solve that conflict?
>>>>>
>>>>> Le mer. 13 déc. 2023 à 09:55, Benedict <bened...@apache.org> a écrit :
>>>>>
>>>>>> A CBO can only make worse decisions than the status quo for what I
>>>>>> presume are the majority of queries - i.e. those that touch only primary
>>>>>> indexes. In general, there are plenty of use cases that prefer 
>>>>>> determinism.
>>>>>> So I agree that there should at least be a CBO implementation that makes
>>>>>> the same decisions as the status quo, deterministically.
>>>>>>
>>>>>>
>>>>>> I do support the proposal, but would like to see some elements
>>>>>> discussed in more detail. The maintenance and distribution of summary
>>>>>> statistics in particular is worthy of its own CEP, and it might be
>>>>>> preferable to split it out. The proposal also seems to imply we are 
>>>>>> aiming
>>>>>> for coordinators to all make the same decision for a query, which I think
>>>>>> is challenging, and it would be worth fleshing out the design here a 
>>>>>> little
>>>>>> (perhaps just in Jira).
>>>>>>
>>>>>>
>>>>>> While I’m not a fan of ALLOW FILTERING, I’m not convinced that this
>>>>>> CEP deprecates it. It is a concrete qualitative guard rail, that I expect
>>>>>> some users will prefer to a cost-based guard rail. Perhaps this could be
>>>>>> left to the CBO to decide how to treat.
>>>>>>
>>>>>>
>>>>>> There’s also not much discussion of the execution model: I think it
>>>>>> would make most sense for this to be independent of any cost and 
>>>>>> optimiser
>>>>>> models (though they might want to operate on them), so that EXPLAIN and
>>>>>> hints can work across optimisers (a suitable hint might essentially 
>>>>>> bypass
>>>>>> the optimiser, if the optimiser permits it, by providing a standard
>>>>>> execution model)
>>>>>>
>>>>>>
>>>>>> I think it would be worth considering providing the execution plan to
>>>>>> the client as part of query preparation, as an opaque payload to supply 
>>>>>> to
>>>>>> coordinators on first contact, as this might simplify the problem of
>>>>>> ensuring queries behave the same without adopting a lot of complexity for
>>>>>> synchronising statistics (which will never provide strong guarantees). Of
>>>>>> course, re-preparing a query might lead to a new plan, though any
>>>>>> coordinators with the query in their cache should be able to retrieve it
>>>>>> cheaply. If the execution model is efficiently serialised this might have
>>>>>> the ancillary benefit of improving the occupancy of our prepared query
>>>>>> cache.
>>>>>>
>>>>>> On 13 Dec 2023, at 00:44, Jon Haddad <j...@jonhaddad.com> wrote:
>>>>>>
>>>>>> 
>>>>>> I think it makes sense to see what the actual overhead is of CBO
>>>>>> before making the assumption it'll be so high that we need to have two 
>>>>>> code
>>>>>> paths.  I'm happy to provide thorough benchmarking and analysis when it
>>>>>> reaches a testing phase.
>>>>>>
>>>>>> I'm excited to see where this goes.  I think it sounds very forward
>>>>>> looking and opens up a lot of possibilities.
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell <cclive1...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Nothing expresses my thoughts better than +1
>>>>>>> ，It feels like it means a lot to Cassandra.
>>>>>>>
>>>>>>> I have a question. Is it easy to turn off cbo's optimizer or by pass
>>>>>>> in some way? Because some simple read and write requests will have 
>>>>>>> better
>>>>>>> performance without cbo, which is also the advantage of Cassandra 
>>>>>>> compared
>>>>>>> to some rdbms.
>>>>>>>
>>>>>>>
>>>>>>> David Capwell <dcapw...@apple.com>于2023年12月13日 周三上午3:37写道：
>>>>>>>
>>>>>>>> Overall LGTM.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer <ble...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi everybody,
>>>>>>>>
>>>>>>>> I would like to open the discussion on the introduction of a cost
>>>>>>>> based optimizer to allow Cassandra to pick the best execution plan 
>>>>>>>> based on
>>>>>>>> the data distribution.Therefore, improving the overall query 
>>>>>>>> performance.
>>>>>>>>
>>>>>>>> This CEP should also lay the groundwork for the future addition of
>>>>>>>> features like joins, subqueries, OR/NOT and index ordering.
>>>>>>>>
>>>>>>>> The proposal is here:
>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
>>>>>>>>
>>>>>>>> Thank you in advance for your feedback.
>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: [DISCUSS] CEP-39: Cost Based Optimizer

Reply via email to