Re: [DISCUSS] CEP-39: Cost Based Optimizer

Jeff Jirsa Thu, 14 Dec 2023 08:28:49 -0800

I'm also torn on the CEP as presented. I think some of it is my negative
emotional response to the examples - e.g. I've literally never seen a real
use case where unfolding constants matters, and I'm trying to convince
myself to read past that.


I also cant tell what exactly you mean when you say "In order to ensure
that the execution plans on each node are the same, the cardinality
estimator should provide the same global statistics on every node as well
as some notification mechanism that can be used to trigger
re-optimization." In my experience, you'll see variable cost on each host,
where a machine that went offline temporarily got a spike in sstables from
repair and has a compaction backlog, causing a higher cost per read on that
host due to extra sstables/duplicate rows/merges. Is the cost based
optimizer in your model going to understand the different cost per replica
and also use that in choosing the appropriate replicas to query?

Finally: ALLOW FILTERING should not be deprecated. It doesn't matter if the
CBO may be able to help improve queries that have filtering. That guard
exists because most people who are new to cassandra don't understand the
difference and it prevents far more self-inflicted failures than anyone can
count. Please do not remove this. You will instantly create a world where
most new users to the database tip over as soon as their adoption picks up.



On Thu, Dec 14, 2023 at 7:49 AM Chris Lohfink <[email protected]> wrote:

> I don't wanna be a blocker for this CEP or anything but did want to put my
> 2 cents in. This CEP is horrifying to me.
>
> I have seen thousands of clusters across multiple companies and helped
> them get working successfully. A vast majority of that involved blocking
> the use of MVs, GROUP BY, secondary indexes, and even just simple _range
> queries_. The "unncessary restrictions of cql" are not only necessary IMHO,
> more restrictions are necessary to be successful at scale. The idea of just
> opening up CQL to general purpose relational queries and lines like 
> "supporting
> queries with joins in an efficient way" ... I would really like us to
> make secondary indexes be a viable option before we start opening up
> floodgates on stuff like this.
>
> Chris
>
> On Thu, Dec 14, 2023 at 9:37 AM Benedict <[email protected]> wrote:
>
>> > So yes, this physical plan is the structure that you have in mind but
>> the idea of sharing it is not part of the CEP.
>>
>>
>> I think it should be. This should form a major part of the API on which
>> any CBO is built.
>>
>>
>> > It seems that there is a difference between the goal of your proposal
>> and the one of the CEP. The goal of the CEP is first to ensure optimal
>> performance. It is ok to change the execution plan for one that delivers
>> better performance. What we want to minimize is having a node performing
>> queries in an inefficient way for a long period of time.
>>
>>
>> You have made a goal of the CEP synchronising summary statistics across
>> the whole cluster in order to achieve some degree of uniformity of query
>> plan. So this is explicitly a goal of the CEP, and synchronising summary
>> statistics is a hard problem and won’t provide strong guarantees.
>>
>>
>> > The client side proposal targets consistency for a given query on a
>> given driver instance. In practice, it would be possible to have 2 similar
>> queries with 2 different execution plans on the same driver
>>
>>
>> This would only be possible if the driver permitted it. A driver could
>> (and should) enforce that it only permits one query plan per query.
>>
>>
>> The opposite is true for your proposal: some queries may begin degrading
>> because they touch specific replicas that optimise the query differently,
>> and this will be hard to debug.
>>
>>
>> On 14 Dec 2023, at 15:30, Benjamin Lerer <[email protected]> wrote:
>>
>> 
>> The binding of the parser output to the schema (what is today the
>> Raw.prepare call) will create the logical plan, expressed as a tree of
>> relational operators. Simplification and normalization will happen on that
>> tree to produce a new equivalent logical plan. That logical plan will be
>> used as input to the optimizer. The output will be a physical plan
>> producing the output specified by the logical plan. A tree of physical
>> operators specifying how the operations should be performed.
>>
>> That physical plan will be stored as part of the statements
>> (SelectStatement, ModificationStatement, ...) in the prepared statement
>> cache. Upon execution, variables will be bound and the
>> RangeCommands/Mutations will be created based on the physical plan.
>>
>> The string representation of a physical plan will effectively represent
>> the output of an EXPLAIN statement but outside of that the physical plan
>> will stay encapsulated within the statement classes.
>> Hints will be parameters provided to the optimizer to enforce some
>> specific choices. Like always using an Index Scan instead of a Table Scan,
>> ignoring the cost comparison.
>>
>> So yes, this physical plan is the structure that you have in mind but the
>> idea of sharing it is not part of the CEP. I did not document it because it
>> will simply be a tree of physical operators used internally.
>>
>> My proposal is that the execution plan of the coordinator that prepares a
>>> query gets serialised to the client, which then provides the execution plan
>>> to all future coordinators, and coordinators provide it to replicas as
>>> necessary.
>>>
>>>
>>> This means it is not possible for any conflict to arise for a single
>>> client. It would guarantee consistency of execution for any single client
>>> (and avoid any drift over the client’s sessions), without necessarily
>>> guaranteeing consistency for all clients.
>>>
>>
>>  It seems that there is a difference between the goal of your proposal
>> and the one of the CEP. The goal of the CEP is first to ensure optimal
>> performance. It is ok to change the execution plan for one that delivers
>> better performance. What we want to minimize is having a node performing
>> queries in an inefficient way for a long period of time.
>>
>> The client side proposal targets consistency for a given query on a given
>> driver instance. In practice, it would be possible to have 2 similar
>> queries with 2 different execution plans on the same driver making things
>> really confusing. Identifying the source of an inefficient query will also
>> be pretty hard.
>>
>> Interestingly, having 2 nodes with 2 different execution plans might not
>> be a serious problem. It simply means that based on cardinality at t1, the
>> optimizer on node 1 chose plan 1 while the one on node 2 chose plan 2 at
>> t2. In practice if the cost estimates reflect properly the actual cost
>> those 2 plans should have pretty similar efficiency. The problem is more
>> about the fact that you would ideally want a uniform behavior around your
>> cluster.
>> Changes of execution plans should only occur at certain points. So the
>> main problematic scenario is when the data distribution is around one of
>> those points. Which is also the point where the change should have the
>> least impact.
>>
>>
>> Le jeu. 14 déc. 2023 à 11:38, Benedict <[email protected]> a écrit :
>>
>>> There surely needs to be a more succinct and abstract representation in
>>> order to perform transformations on the query plan? You don’t intend to
>>> manipulate the object graph directly as you apply any transformations when
>>> performing simplification or cost based analysis? This would also (I
>>> expect) be the form used to support EXPLAIN functionality, and probably
>>> also HINTs etc. This would ideally *not* be coupled to the CBO itself,
>>> and would ideally be succinctly serialised.
>>>
>>> I would very much expect the query plan to be represented abstractly as
>>> part of this work, and for there to be a mechanism that translates this
>>> abstract representation into the object graph that executes it.
>>>
>>> If I’m incorrect, could you please elaborate more specifically how you
>>> intend to go about this?
>>>
>>> On 14 Dec 2023, at 10:33, Benjamin Lerer <[email protected]> wrote:
>>>
>>> 
>>>
>>>> I mean that an important part of this work - not specified in the CEP
>>>> (AFAICT) - should probably be to define some standard execution model, that
>>>> we can manipulate and serialise, for use across (and without) optimisers.
>>>
>>>
>>> I am confused because for me an execution model defines how operations
>>> are executed within the database in a conceptual way, which is not
>>> something that this CEP intends to change. Do you mean the
>>> physical/execution plan?
>>> Today this plan is somehow represented for reads by the SelectStatement
>>> and its components (Selections, StatementRestrictions, ...) it is then
>>> converted at execution time after parameter binding into a ReadCommand
>>> which is sent to the replicas.
>>> We plan to refactor SelectStatement and its components but the
>>> ReadCommands change should be relatively small. What you are proposing is
>>> not part of the scope of this CEP.
>>>
>>> Le jeu. 14 déc. 2023 à 10:24, Benjamin Lerer <[email protected]> a
>>> écrit :
>>>
>>>> Can you share the reasons why Apache Calcite is not suitable for this
>>>>> case and why it was rejected
>>>>
>>>>
>>>> My understanding is that Calcite was made for two main things: to help
>>>> with optimizing SQL-like languages and to let people query different kinds
>>>> of data sources together.
>>>>
>>>> We could think about using it for our needs, but there are some big
>>>> problems:
>>>>
>>>>    1.
>>>>
>>>>    CQL is not SQL. There are significant differences between the 2
>>>>    languages
>>>>    2.
>>>>
>>>>    Cassandra has its own specificities that will influence the cost
>>>>    model and the way we deal with optimizations: partitions, replication
>>>>    factors, consistency levels, LSM tree storage, ...
>>>>    3.
>>>>
>>>>    Every framework comes with its own limitations and additional cost
>>>>
>>>> From my view, there are too many big differences between what Calcite
>>>> does and what we need in Cassandra. If we used Calcite, it would also mean
>>>> relying a lot on another system that everyone would have to learn and
>>>> adjust to. The problems and extra work this would bring don't seem worth
>>>> the benefits we might get
>>>>
>>>>
>>>> Le mer. 13 déc. 2023 à 18:06, Benjamin Lerer <[email protected]> a
>>>> écrit :
>>>>
>>>>> One thing that I did not mention is the fact that this CEP is only a
>>>>> high level proposal. There will be deeper discussions on the dev list
>>>>> around the different parts of this proposal when we reach those parts and
>>>>> have enough details to make those discussions more meaningful.
>>>>>
>>>>>
>>>>>> The maintenance and distribution of summary statistics in particular
>>>>>> is worthy of its own CEP, and it might be preferable to split it out.
>>>>>
>>>>>
>>>>> For maintaining node statistics the idea is to re-use the current
>>>>> Memtable/SSTable mechanism and relies on mergeable statistics. That will
>>>>> allow us to easily build node level statistics for a given table by 
>>>>> merging
>>>>> all the statistics of its memtable and SSTables. For the distribution of
>>>>> these node statistics we are still exploring different options. We can 
>>>>> come
>>>>> back with a precise proposal once we have hammered all the details.
>>>>> Is it for you a blocker for this CEP or do you just want to make sure
>>>>> that this part is discussed in deeper details before we implement it?
>>>>>
>>>>>>
>>>>>> The proposal also seems to imply we are aiming for coordinators to
>>>>>> all make the same decision for a query, which I think is challenging, and
>>>>>> it would be worth fleshing out the design here a little (perhaps just in
>>>>>> Jira).
>>>>>
>>>>>
>>>>> The goal is that the large majority of nodes preparing a query at a
>>>>> given point in time should make the same decision and that over time all
>>>>> nodes should converge toward the same decision. This part is dependent on
>>>>> the node statistics distribution, the cost model and the triggers for
>>>>> re-optimization (that will require some experimentation).
>>>>>
>>>>> There’s also not much discussion of the execution model: I think it
>>>>>> would make most sense for this to be independent of any cost and 
>>>>>> optimiser
>>>>>> models (though they might want to operate on them), so that EXPLAIN and
>>>>>> hints can work across optimisers (a suitable hint might essentially 
>>>>>> bypass
>>>>>> the optimiser, if the optimiser permits it, by providing a standard
>>>>>> execution model)
>>>>>>
>>>>>
>>>>> It is not clear to me what you mean by "a standard execution model"?
>>>>> Otherwise, we were not planning to have the execution model or the hints
>>>>> depending on the optimizer.
>>>>>
>>>>> I think it would be worth considering providing the execution plan to
>>>>>> the client as part of query preparation, as an opaque payload to supply 
>>>>>> to
>>>>>> coordinators on first contact, as this might simplify the problem of
>>>>>> ensuring queries behave the same without adopting a lot of complexity for
>>>>>> synchronising statistics (which will never provide strong guarantees). Of
>>>>>> course, re-preparing a query might lead to a new plan, though any
>>>>>> coordinators with the query in their cache should be able to retrieve it
>>>>>> cheaply. If the execution model is efficiently serialised this might have
>>>>>> the ancillary benefit of improving the occupancy of our prepared query
>>>>>> cache.
>>>>>>
>>>>>
>>>>> I am not sure that I understand your proposal. If 2 nodes build a
>>>>> different execution plan how do you solve that conflict?
>>>>>
>>>>> Le mer. 13 déc. 2023 à 09:55, Benedict <[email protected]> a écrit :
>>>>>
>>>>>> A CBO can only make worse decisions than the status quo for what I
>>>>>> presume are the majority of queries - i.e. those that touch only primary
>>>>>> indexes. In general, there are plenty of use cases that prefer 
>>>>>> determinism.
>>>>>> So I agree that there should at least be a CBO implementation that makes
>>>>>> the same decisions as the status quo, deterministically.
>>>>>>
>>>>>>
>>>>>> I do support the proposal, but would like to see some elements
>>>>>> discussed in more detail. The maintenance and distribution of summary
>>>>>> statistics in particular is worthy of its own CEP, and it might be
>>>>>> preferable to split it out. The proposal also seems to imply we are 
>>>>>> aiming
>>>>>> for coordinators to all make the same decision for a query, which I think
>>>>>> is challenging, and it would be worth fleshing out the design here a 
>>>>>> little
>>>>>> (perhaps just in Jira).
>>>>>>
>>>>>>
>>>>>> While I’m not a fan of ALLOW FILTERING, I’m not convinced that this
>>>>>> CEP deprecates it. It is a concrete qualitative guard rail, that I expect
>>>>>> some users will prefer to a cost-based guard rail. Perhaps this could be
>>>>>> left to the CBO to decide how to treat.
>>>>>>
>>>>>>
>>>>>> There’s also not much discussion of the execution model: I think it
>>>>>> would make most sense for this to be independent of any cost and 
>>>>>> optimiser
>>>>>> models (though they might want to operate on them), so that EXPLAIN and
>>>>>> hints can work across optimisers (a suitable hint might essentially 
>>>>>> bypass
>>>>>> the optimiser, if the optimiser permits it, by providing a standard
>>>>>> execution model)
>>>>>>
>>>>>>
>>>>>> I think it would be worth considering providing the execution plan to
>>>>>> the client as part of query preparation, as an opaque payload to supply 
>>>>>> to
>>>>>> coordinators on first contact, as this might simplify the problem of
>>>>>> ensuring queries behave the same without adopting a lot of complexity for
>>>>>> synchronising statistics (which will never provide strong guarantees). Of
>>>>>> course, re-preparing a query might lead to a new plan, though any
>>>>>> coordinators with the query in their cache should be able to retrieve it
>>>>>> cheaply. If the execution model is efficiently serialised this might have
>>>>>> the ancillary benefit of improving the occupancy of our prepared query
>>>>>> cache.
>>>>>>
>>>>>> On 13 Dec 2023, at 00:44, Jon Haddad <[email protected]> wrote:
>>>>>>
>>>>>> 
>>>>>> I think it makes sense to see what the actual overhead is of CBO
>>>>>> before making the assumption it'll be so high that we need to have two 
>>>>>> code
>>>>>> paths.  I'm happy to provide thorough benchmarking and analysis when it
>>>>>> reaches a testing phase.
>>>>>>
>>>>>> I'm excited to see where this goes.  I think it sounds very forward
>>>>>> looking and opens up a lot of possibilities.
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Nothing expresses my thoughts better than +1
>>>>>>> ，It feels like it means a lot to Cassandra.
>>>>>>>
>>>>>>> I have a question. Is it easy to turn off cbo's optimizer or by pass
>>>>>>> in some way? Because some simple read and write requests will have 
>>>>>>> better
>>>>>>> performance without cbo, which is also the advantage of Cassandra 
>>>>>>> compared
>>>>>>> to some rdbms.
>>>>>>>
>>>>>>>
>>>>>>> David Capwell <[email protected]>于2023年12月13日 周三上午3:37写道：
>>>>>>>
>>>>>>>> Overall LGTM.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi everybody,
>>>>>>>>
>>>>>>>> I would like to open the discussion on the introduction of a cost
>>>>>>>> based optimizer to allow Cassandra to pick the best execution plan 
>>>>>>>> based on
>>>>>>>> the data distribution.Therefore, improving the overall query 
>>>>>>>> performance.
>>>>>>>>
>>>>>>>> This CEP should also lay the groundwork for the future addition of
>>>>>>>> features like joins, subqueries, OR/NOT and index ordering.
>>>>>>>>
>>>>>>>> The proposal is here:
>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
>>>>>>>>
>>>>>>>> Thank you in advance for your feedback.
>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: [DISCUSS] CEP-39: Cost Based Optimizer

Reply via email to