Re: [DISCUSS] CEP-39: Cost Based Optimizer

Benjamin Lerer Thu, 21 Dec 2023 09:08:38 -0800

Hi Scott,

Thanks for your feedback.


If I am not mistaken the main concern in your email is that without
features that will heavily benefit from the optimizer, this work will not
bring much. You are, therefore, under the impression that this work is one
or two years early.

In my perspective, we are already late. We have several features running in
production that we chose to not open source yet because implementing phase
1 of the CEP would have heavily simplify their designs. The cost of
developing them was much higher than what it would have been if the CEP had
already been implemented. We are also currently working on some SAI
features that need cost based optimization. This CEP will not be ready
before at least a year, so I imagine that it will be even more needed by
that time.

The proposal is a pretty generic framework. The approach has been around
for several decades and has been conceived with evolution in mind.

Regarding the CQL and SQL discussion, you seem to link relational algebra
to SQL but consider it separated from CQL. As SQL, CQL is a query language
used to perform queries on relations (a set of tuples) and as such
relational algebra can be use to express those. Using relational algebra
within the CQL layer does not change in anyway how CQL works today. On the
other end, it will allow to restructure the code making evolution much
easier. There are no reference to the SQL language in the proposal, even if
it refers in multiple places to optimization solutions implemented in
relational databases.

For Calcite, I took some time to download the source code, play with it and
dig into the documentation before discarding it. You can talk with Maxim, I
believe that he has a deeper knowledge of the tool than I have and could
provide a different perspective.

Let me know, if I misunderstood some of your comments or missed some
questions.







On Wed, Dec 20, 2023 at 5:15 PM C. Scott Andreas <sc...@paradoxica.net>
wrote:

> Thanks for this proposal and apologies for my delayed engagement during
> the Cassandra Summit last week. Benjamin, I appreciate your work on this
> and your engagement on this thread – I know it’s a lot of discussion to
> field.
>
> On ALLOW FILTERING:
>
> I share Chris Lohfink’s experience in operating clusters that have made
> heavy use of ALLOW FILTERING. It is a valuable guardrail for the database
> to require users specifically annotate queries that may cost 1000x+ that of
> a simple lookup for a primary key. For my own purposes, I’d actually like
> to go a step further and disable queries that require ALLOW FILTERING by
> default unless explicitly reviewed - but haven’t taken the step of adding
> such a guardrail yet.
>
> CBOs, CQL, and SQL:
>
> The CBO proposal cuts to the heart of one of the fundamental differences
> between SQL and CQL that I haven’t seen exercised yet.
>
> SQL allows users to define schemas that provide structure to data and to
> issue queries over them based on a relational algebra. SQL’s purpose is to
> decouple the on-disk representation of data from the query language used to
> access and aggregate it. This produces a very flexible query language that
> can be used to ask a database anything - but at a cost of execution that
> may be effectively infinite (think recursive subqueries).
>
> CQL is very different. While SQL is designed to decouple query language
> and on-disk representation, CQL is designed specifically to couple them. A
> CQL schema declares data placement, query routing, and disk serialization,
> and sorting to enable efficient retrieval. This is a very different design
> goal from a general-purpose query language. In time CQL may gain many
> SQL-like capabilities (and I hope it does!), but it will require careful
> work to do so without creating many footguns.
>
> Feature evolution:
>
> I agree that in the coming years, Cassandra is likely to gain
> semi-relational features via maturation of the byte-ordered partitioner
> (via range splitting, via TCM); the availability of SAI and its evolution
> (e.g., via new functionality enabled by Lucene libraries); potentially
> joins via BOP; and others. This is a really exciting future, and one that
> probably requires a planner and optimizer.
>
> My general inclination is that a planner + optimizer seem valuable for
> Cassandra, but that the proposal feels a year or two early. The database
> doesn’t yet have a plan of record to add support for some of the
> semirelational constructs we’ve talked about, and I’m not aware of active
> CEPs that propose designs for features like these yet.
>
> Like Jeff, I’d find this much easier to discuss in the context of a
> database gaining support for these features with specific designs available
> to discuss. The ALLOW FILTERING and constant folding examples are a little
> slim. Index selection is probably the best one I can think of right now -
> e.g., if we wanted to add the ability to issue partition-restricted queries
> over a base table with multiple indexes defined without users specifically
> declaring an index. I haven’t seen an at-scale use case that would be
> better served by planner-driven index selection vs. user-driven, but they
> might be out there.
>
> It’s not my role to suggest changes in prioritization for work that isn’t
> mine. But I feel that the project could design better interfaces and a
> better planner/optimizer if that work were oriented toward improving
> particular features that are in wide use.
>
> To summarize my thoughts:
>
> – I agree that it is valuable for Apache Cassandra to gain a
> planner/optimizer.
> – I disagree with removing or deprecating ALLOW FILTERING and see this as
> a necessary guardrail.
> – I think the proposal surfaces the differences between the design goals
> of CQL and SQL, but I don’t feel that it quite addresses it.
> – I think we could collectively build a stronger planner/optimizer once
> some of the features it’s meant to optimize are in place.
> – I’m not quite sold on the need for the implementation to be bespoke
> based on discussion so far (vs. Calcite/Catalyst etc), but haven’t done the
> legwork to investigate this myself.
> – I *love* the idea of capturing many of the execution and hotness
> statistics that are proposed in the CEP. It would be very valuable to
> surface query cost to users independent of a CBO. Stats like these would
> also be valuable toward retrofitting Cassandra for multitenancy by bounding
> or rate-limiting users on query cost. Tracking SSTable hotness would also
> be useful toward evaluating feasibility of tiered storage, too.
>
> Thanks for this proposal and discussion so far — appreciate and enjoying
> it.
>
> – Scott
>
> On Dec 20, 2023, at 7:52 AM, Benjamin Lerer <ble...@apache.org> wrote:
>
>
> If we are to address that within the CEP itself then we should discuss it
>> here, as I would like to fully understand the approach as well as how it
>> relates to consistency of execution and the idea of triggering
>> re-optimisation.
>>
>
> Sure, that was my plan.
>
>
> I’m not sold on the proposed set of characteristics, and think my coupling
>> an execution plan to a given prepared statement for clients to supply is
>> perhaps simpler to implement and maintain, and has corollary benefits -
>> such as providing a mechanism for users to specify their own execution plan.
>>
>
>>
> Note, my proposal cuts across all of these elements of the CEP. There is
>> no obvious need for a cross-cluster re-optimisation event or cross cluster
>> statistic management.
>>
>
> I think that I am missing one part of your proposal. How do you plan to
> build the initial execution plan for a prepared statement?
>
> Le mer. 20 déc. 2023 à 14:05, Benedict <bened...@apache.org> a écrit :
>
>>
>> If we are to address that within the CEP itself then we should discuss it
>> here, as I would like to fully understand the approach as well as how it
>> relates to consistency of execution and the idea of triggering
>> re-optimisation. These ideas are all interrelated.
>>
>> I’m not sold on the proposed set of characteristics, and think my
>> coupling an execution plan to a given prepared statement for clients to
>> supply is perhaps simpler to implement and maintain, and has corollary
>> benefits - such as providing a mechanism for users to specify their own
>> execution plan.
>>
>> Note, my proposal cuts across all of these elements of the CEP. There is
>> no obvious need for a cross-cluster re-optimisation event or cross cluster
>> statistic management.
>>
>> We still also need to discuss more concretely how the base statistics
>> themselves will be derived, as there is little detail here today in the
>> proposal.
>>
>>
>> On 20 Dec 2023, at 12:58, Benjamin Lerer <b.le...@gmail.com> wrote:
>>
>> 
>> After the second phase of the CEP, we will have two optimizer
>> implementations. One will be similar to what we have today and the other
>> one will be the CBO. As those implementations will be behind the new
>> Optimizer API interfaces they will both have support for EXPLAIN and they
>> will both benefit from the simplification/normalization rules. Such as the
>> ones that David mentioned.
>>
>> Regarding functions, we are already able to determine which ones are
>> deterministic (
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/Function.java#L55).
>> We simply do not take advantage of it.
>>
>> I removed the ALLOW FILTERING part and will open a discussion about it at
>> the beginning of next year.
>>
>> Regarding the statistics management part, I would like to try to address
>> it within the CEP itself, if feasible. If it turns out to be too
>> complicated, I will separate it into its own CEP.
>>
>> Le mar. 19 déc. 2023 à 22:23, David Capwell <dcapw...@apple.com> a
>> écrit :
>>
>>> even if the only outcome of all this work were to tighten up
>>> inconsistencies in our grammar and provide more robust EXPLAIN and EXPLAIN
>>> ANALYZE functionality to our end users, I think that would be highly
>>> valuable
>>>
>>>
>>> In my mental model a no-op optimizer just becomes what we have today
>>> (since all new features really should be disabled by default, I would hope
>>> we support this), so we benefit from having a logical AST + ability to
>>> mutate it before we execute it and we can use this to make things nicer for
>>> users (as you are calling out)
>>>
>>> Here is one example that stands out to me in accord
>>>
>>> LET a = (select * from tbl where pk=0);
>>> Insert into tbl2 (pk, …) values (a.pk
>>> <https://urldefense.com/v3/__http://a.pk__;!!PbtH5S7Ebw!cP2rlRq25SCWAmdQJ_cz5RtYVrdMIhcN69V-IJVXLsIw9VKk0_H_LFCSk0oj_Kot30TGKpmBLp1o4wT1osVRq0k$>,
>>> …); — this is not allowed as we don’t know the primary key… but this could
>>> trivially be written to replace a.pk
>>> <https://urldefense.com/v3/__http://a.pk__;!!PbtH5S7Ebw!cP2rlRq25SCWAmdQJ_cz5RtYVrdMIhcN69V-IJVXLsIw9VKk0_H_LFCSk0oj_Kot30TGKpmBLp1o4wT1osVRq0k$>
>>> with 0…
>>>
>>> With this work we could also rethink what functions are deterministic
>>> and which ones are not (not trying to bike shed)… simple example is
>>> “now” (select now() from tbl; — each row will have a different timestamp),
>>> if we make this deterministic we can avoid calling it for each row and
>>> instead just replace it with a constant for the query…
>>>
>>> Even if the CBO is dropped in favor of no-op (what we do today), I still
>>> see value in this work.
>>>
>>> I do think that the CBO really doesn’t solve the fact some features
>>> don’t work well, if anything it could just mask it until it’s too late….
>>> If user builds an app using filtering and everything is going well in QA,
>>> but once they see a spike in traffic in prod we start rejecting… this is a
>>> bad user experience IMO… we KNOW you must think about this before you go
>>> this route, so a CBO letting you ignore it till you hit a wall I don’t
>>> think is the best (not saying ALLOW FILTERING is the solution to this… but
>>> it at least is a signal to users to think through their data model).
>>>
>>>
>>> On Dec 15, 2023, at 6:38 PM, Josh McKenzie <jmcken...@apache.org> wrote:
>>>
>>> Goals
>>>
>>>    - Introduce a Cascades(2) query optimizer with rules easily
>>>    extendable
>>>    - Improve query performance for most common queries
>>>    - Add support for EXPLAIN and EXPLAIN ANALYZE to help with query
>>>    optimization and troubleshooting
>>>    - Lay the groundwork for the addition of features like joins,
>>>    subqueries, OR/NOT and index ordering
>>>    - Put in place some performance benchmarks to validate query
>>>    optimizations
>>>
>>> I think these are sensible goals. We're possibly going to face a
>>> chicken-or-egg problem with a feature like this that so heavily intersects
>>> with other as-yet written features where much of the value is in the
>>> intersection of them; if we continue down the current "one heuristic to
>>> rule them all" query planning approach we have now, we'll struggle to
>>> meaningfully explore or conceptualize the value of potential alternatives
>>> different optimizers could present us. Flip side, to Benedict's point,
>>> until SAI hits and/or some other potential future things we've all talked
>>> about, this cbo would likely fall directly into the same path that we
>>> effectively have hard-coded today (primary index path only).
>>>
>>> One thing I feel pretty strongly about: even if the only outcome of all
>>> this work were to tighten up inconsistencies in our grammar and provide
>>> more robust EXPLAIN and EXPLAIN ANALYZE functionality to our end users, I
>>> think that would be highly valuable. This path of "only" would be
>>> predicated on us not having successful introduction of a robust secondary
>>> index implementation and a variety of other things we have a lot of
>>> interest in, so I find it unlikely, but worth calling out.
>>>
>>> re: the removal of ALLOW FILTERING - is there room for compromise here
>>> and instead converting it to a guardrail that defaults to being enabled?
>>> That could theoretically give us a more gradual path to migration to a
>>> cost-based guardrail for instance, and would preserve the current
>>> robustness of the system while making it at least a touch more configurable.
>>>
>>> On Fri, Dec 15, 2023, at 11:03 AM, Chris Lohfink wrote:
>>>
>>> Thanks for time in addressing concerns. At least with initial versions,
>>> as long as there is a way to replace it with noop or disable it I would be
>>> happy. This is pretty standard practice with features nowadays but I wanted
>>> to highlight it as this might require some pretty tight coupling.
>>>
>>> Chris
>>>
>>> On Fri, Dec 15, 2023 at 7:57 AM Benjamin Lerer <ble...@apache.org>
>>> wrote:
>>>
>>> Hey Chris,
>>> You raise some valid points.
>>>
>>> I believe that there are 3 points that you mentioned:
>>> 1) CQL restrictions are some form of safety net and should be kept
>>> 2) A lot of Cassandra features do not scale and/or are too easy to use
>>> in a wrong way that can make the whole system collapse. We should not add
>>> more to that list. Especially not joins.
>>>
>>> 3) Should we not start to fix features like secondary index rather than
>>> adding new ones? Which is heavily linked to 2).
>>>
>>> Feel free to correct me if I got them wrong or missed one.
>>>
>>> Regarding 1), I believe that you refer to the "Removing unnecessary CQL
>>> query limitations and inconsistencies" section. We are not planning to
>>> remove any safety net here.
>>> What we want to remove is a certain amount of limitations which make
>>> things confusing for a user trying to write a query for no good reason.
>>> Like "why can I define a column alias but not use it anywhere in my query?"
>>> or "Why can I not create a list with 2 bind parameters?". While refactoring
>>> some CQL code, I kept on finding those types of exceptions that we can
>>> easily remove while simplifying the code at the same time.
>>>
>>> For 2), I agree that at a certain scale or for some scenarios, some
>>> features simply do not scale or catch users by surprise. The goal of the
>>> CEP is to improve things in 2 ways. One is by making Cassandra smarter in
>>> the way it chooses how to process queries, hopefully improving its overall
>>> scalability. The other by being transparent about how Cassandra will
>>> execute the queries through the use of EXPLAIN. One problem of GROUP BY for
>>> example is that most users do not realize what is actually happening under
>>> the hood and therefore its limitations. I do not believe that EXPLAIN will
>>> change everything but it will help people to get a better understanding of
>>> the limitations of some features.
>>>
>>> I do not know which features will be added in the future to C*. That
>>> will be discussed through some future CEPs. Nevertheless, I do not believe
>>> that it makes sense to write a CEP for a query optimizer without taking
>>> into account that we might at some point add some level of support for
>>> joins or subqueries. We have been too often delivering features without
>>> looking at what could be the possible evolutions which resulted in code
>>> where adding new features was more complex than it should have been. I do
>>> not want to make the same mistake. I want to create an optimizer that can
>>> be improved easily and considering joins or other features simply help to
>>> build things in a more generic way.
>>>
>>> Regarding feature stabilization, I believe that it is happening. I have
>>> heard plans of how to solve MVs, range queries, hot partitions, ... and
>>> there was a lot of thinking behind those plans. Secondary indexes are being
>>> worked on. We hope that the optimizer will also help with some index
>>> queries.
>>>
>>> It seems to me that this proposal is going toward the direction that you
>>> want without introducing new problems for scalability.
>>>
>>>
>>>
>>>
>>> Le jeu. 14 déc. 2023 à 16:47, Chris Lohfink <clohfin...@gmail.com> a
>>> écrit :
>>>
>>> I don't wanna be a blocker for this CEP or anything but did want to put
>>> my 2 cents in. This CEP is horrifying to me.
>>>
>>> I have seen thousands of clusters across multiple companies and helped
>>> them get working successfully. A vast majority of that involved blocking
>>> the use of MVs, GROUP BY, secondary indexes, and even just simple _range
>>> queries_. The "unncessary restrictions of cql" are not only necessary IMHO,
>>> more restrictions are necessary to be successful at scale. The idea of just
>>> opening up CQL to general purpose relational queries and lines like 
>>> "supporting
>>> queries with joins in an efficient way" ... I would really like us to
>>> make secondary indexes be a viable option before we start opening up
>>> floodgates on stuff like this.
>>>
>>> Chris
>>>
>>> On Thu, Dec 14, 2023 at 9:37 AM Benedict <bened...@apache.org> wrote:
>>>
>>>
>>> > So yes, this physical plan is the structure that you have in mind but
>>> the idea of sharing it is not part of the CEP.
>>>
>>> I think it should be. This should form a major part of the API on which
>>> any CBO is built.
>>>
>>> > It seems that there is a difference between the goal of your proposal
>>> and the one of the CEP. The goal of the CEP is first to ensure optimal
>>> performance. It is ok to change the execution plan for one that delivers
>>> better performance. What we want to minimize is having a node performing
>>> queries in an inefficient way for a long period of time.
>>>
>>> You have made a goal of the CEP synchronising summary statistics across
>>> the whole cluster in order to achieve some degree of uniformity of query
>>> plan. So this is explicitly a goal of the CEP, and synchronising summary
>>> statistics is a hard problem and won’t provide strong guarantees.
>>>
>>> > The client side proposal targets consistency for a given query on a
>>> given driver instance. In practice, it would be possible to have 2 similar
>>> queries with 2 different execution plans on the same driver
>>>
>>> This would only be possible if the driver permitted it. A driver could
>>> (and should) enforce that it only permits one query plan per query.
>>>
>>> The opposite is true for your proposal: some queries may begin degrading
>>> because they touch specific replicas that optimise the query differently,
>>> and this will be hard to debug.
>>>
>>>
>>>
>>> On 14 Dec 2023, at 15:30, Benjamin Lerer <b.le...@gmail.com> wrote:
>>>
>>> 
>>> The binding of the parser output to the schema (what is today the
>>> Raw.prepare call) will create the logical plan, expressed as a tree of
>>> relational operators. Simplification and normalization will happen on that
>>> tree to produce a new equivalent logical plan. That logical plan will be
>>> used as input to the optimizer. The output will be a physical plan
>>> producing the output specified by the logical plan. A tree of physical
>>> operators specifying how the operations should be performed.
>>>
>>> That physical plan will be stored as part of the statements
>>> (SelectStatement, ModificationStatement, ...) in the prepared statement
>>> cache. Upon execution, variables will be bound and the
>>> RangeCommands/Mutations will be created based on the physical plan.
>>>
>>> The string representation of a physical plan will effectively represent
>>> the output of an EXPLAIN statement but outside of that the physical plan
>>> will stay encapsulated within the statement classes.
>>> Hints will be parameters provided to the optimizer to enforce some
>>> specific choices. Like always using an Index Scan instead of a Table Scan,
>>> ignoring the cost comparison.
>>>
>>> So yes, this physical plan is the structure that you have in mind but
>>> the idea of sharing it is not part of the CEP. I did not document it
>>> because it will simply be a tree of physical operators used internally.
>>>
>>> My proposal is that the execution plan of the coordinator that prepares
>>> a query gets serialised to the client, which then provides the execution
>>> plan to all future coordinators, and coordinators provide it to replicas as
>>> necessary.
>>>
>>> This means it is not possible for any conflict to arise for a single
>>> client. It would guarantee consistency of execution for any single client
>>> (and avoid any drift over the client’s sessions), without necessarily
>>> guaranteeing consistency for all clients.
>>>
>>>
>>>  It seems that there is a difference between the goal of your proposal
>>> and the one of the CEP. The goal of the CEP is first to ensure optimal
>>> performance. It is ok to change the execution plan for one that delivers
>>> better performance. What we want to minimize is having a node performing
>>> queries in an inefficient way for a long period of time.
>>>
>>> The client side proposal targets consistency for a given query on a
>>> given driver instance. In practice, it would be possible to have 2 similar
>>> queries with 2 different execution plans on the same driver making things
>>> really confusing. Identifying the source of an inefficient query will also
>>> be pretty hard.
>>>
>>> Interestingly, having 2 nodes with 2 different execution plans might not
>>> be a serious problem. It simply means that based on cardinality at t1, the
>>> optimizer on node 1 chose plan 1 while the one on node 2 chose plan 2 at
>>> t2. In practice if the cost estimates reflect properly the actual cost
>>> those 2 plans should have pretty similar efficiency. The problem is more
>>> about the fact that you would ideally want a uniform behavior around your
>>> cluster.
>>> Changes of execution plans should only occur at certain points. So the
>>> main problematic scenario is when the data distribution is around one of
>>> those points. Which is also the point where the change should have the
>>> least impact.
>>>
>>>
>>>
>>> Le jeu. 14 déc. 2023 à 11:38, Benedict <bened...@apache.org> a écrit :
>>>
>>>
>>> There surely needs to be a more succinct and abstract representation in
>>> order to perform transformations on the query plan? You don’t intend to
>>> manipulate the object graph directly as you apply any transformations when
>>> performing simplification or cost based analysis? This would also (I
>>> expect) be the form used to support EXPLAIN functionality, and probably
>>> also HINTs etc. This would ideally *not* be coupled to the CBO itself,
>>> and would ideally be succinctly serialised.
>>>
>>> I would very much expect the query plan to be represented abstractly as
>>> part of this work, and for there to be a mechanism that translates this
>>> abstract representation into the object graph that executes it.
>>>
>>> If I’m incorrect, could you please elaborate more specifically how you
>>> intend to go about this?
>>>
>>>
>>> On 14 Dec 2023, at 10:33, Benjamin Lerer <b.le...@gmail.com> wrote:
>>>
>>> 
>>>
>>> I mean that an important part of this work - not specified in the CEP
>>> (AFAICT) - should probably be to define some standard execution model, that
>>> we can manipulate and serialise, for use across (and without) optimisers.
>>>
>>>
>>> I am confused because for me an execution model defines how operations
>>> are executed within the database in a conceptual way, which is not
>>> something that this CEP intends to change. Do you mean the
>>> physical/execution plan?
>>> Today this plan is somehow represented for reads by the SelectStatement
>>> and its components (Selections, StatementRestrictions, ...) it is then
>>> converted at execution time after parameter binding into a ReadCommand
>>> which is sent to the replicas.
>>> We plan to refactor SelectStatement and its components but the
>>> ReadCommands change should be relatively small. What you are proposing is
>>> not part of the scope of this CEP.
>>>
>>> Le jeu. 14 déc. 2023 à 10:24, Benjamin Lerer <b.le...@gmail.com> a
>>> écrit :
>>>
>>> Can you share the reasons why Apache Calcite is not suitable for this
>>> case and why it was rejected
>>>
>>>
>>> My understanding is that Calcite was made for two main things: to help
>>> with optimizing SQL-like languages and to let people query different kinds
>>> of data sources together.
>>>
>>> We could think about using it for our needs, but there are some big
>>> problems:
>>>
>>>    1.
>>>
>>>    CQL is not SQL. There are significant differences between the 2
>>>    languages
>>>    2.
>>>
>>>    Cassandra has its own specificities that will influence the cost
>>>    model and the way we deal with optimizations: partitions, replication
>>>    factors, consistency levels, LSM tree storage, ...
>>>    3.
>>>
>>>    Every framework comes with its own limitations and additional cost
>>>
>>> From my view, there are too many big differences between what Calcite
>>> does and what we need in Cassandra. If we used Calcite, it would also mean
>>> relying a lot on another system that everyone would have to learn and
>>> adjust to. The problems and extra work this would bring don't seem worth
>>> the benefits we might get
>>>
>>>
>>> Le mer. 13 déc. 2023 à 18:06, Benjamin Lerer <b.le...@gmail.com> a
>>> écrit :
>>>
>>> One thing that I did not mention is the fact that this CEP is only a
>>> high level proposal. There will be deeper discussions on the dev list
>>> around the different parts of this proposal when we reach those parts and
>>> have enough details to make those discussions more meaningful.
>>>
>>>
>>> The maintenance and distribution of summary statistics in particular is
>>> worthy of its own CEP, and it might be preferable to split it out.
>>>
>>>
>>> For maintaining node statistics the idea is to re-use the current
>>> Memtable/SSTable mechanism and relies on mergeable statistics. That will
>>> allow us to easily build node level statistics for a given table by merging
>>> all the statistics of its memtable and SSTables. For the distribution of
>>> these node statistics we are still exploring different options. We can come
>>> back with a precise proposal once we have hammered all the details.
>>> Is it for you a blocker for this CEP or do you just want to make sure
>>> that this part is discussed in deeper details before we implement it?
>>>
>>>
>>>
>>>
>>> The proposal also seems to imply we are aiming for coordinators to all
>>> make the same decision for a query, which I think is challenging, and it
>>> would be worth fleshing out the design here a little (perhaps just in Jira).
>>>
>>>
>>>
>>> The goal is that the large majority of nodes preparing a query at a
>>> given point in time should make the same decision and that over time all
>>> nodes should converge toward the same decision. This part is dependent on
>>> the node statistics distribution, the cost model and the triggers for
>>> re-optimization (that will require some experimentation).
>>>
>>> There’s also not much discussion of the execution model: I think it
>>> would make most sense for this to be independent of any cost and optimiser
>>> models (though they might want to operate on them), so that EXPLAIN and
>>> hints can work across optimisers (a suitable hint might essentially bypass
>>> the optimiser, if the optimiser permits it, by providing a standard
>>> execution model)
>>>
>>>
>>> It is not clear to me what you mean by "a standard execution model"?
>>> Otherwise, we were not planning to have the execution model or the hints
>>> depending on the optimizer.
>>>
>>> I think it would be worth considering providing the execution plan to
>>> the client as part of query preparation, as an opaque payload to supply to
>>> coordinators on first contact, as this might simplify the problem of
>>> ensuring queries behave the same without adopting a lot of complexity for
>>> synchronising statistics (which will never provide strong guarantees). Of
>>> course, re-preparing a query might lead to a new plan, though any
>>> coordinators with the query in their cache should be able to retrieve it
>>> cheaply. If the execution model is efficiently serialised this might have
>>> the ancillary benefit of improving the occupancy of our prepared query
>>> cache.
>>>
>>>
>>> I am not sure that I understand your proposal. If 2 nodes build a
>>> different execution plan how do you solve that conflict?
>>>
>>> Le mer. 13 déc. 2023 à 09:55, Benedict <bened...@apache.org> a écrit :
>>>
>>>
>>> A CBO can only make worse decisions than the status quo for what I
>>> presume are the majority of queries - i.e. those that touch only primary
>>> indexes. In general, there are plenty of use cases that prefer determinism.
>>> So I agree that there should at least be a CBO implementation that makes
>>> the same decisions as the status quo, deterministically.
>>>
>>> I do support the proposal, but would like to see some elements discussed
>>> in more detail. The maintenance and distribution of summary statistics in
>>> particular is worthy of its own CEP, and it might be preferable to split it
>>> out. The proposal also seems to imply we are aiming for coordinators to all
>>> make the same decision for a query, which I think is challenging, and it
>>> would be worth fleshing out the design here a little (perhaps just in Jira).
>>>
>>> While I’m not a fan of ALLOW FILTERING, I’m not convinced that this CEP
>>> deprecates it. It is a concrete qualitative guard rail, that I expect some
>>> users will prefer to a cost-based guard rail. Perhaps this could be left to
>>> the CBO to decide how to treat.
>>>
>>> There’s also not much discussion of the execution model: I think it
>>> would make most sense for this to be independent of any cost and optimiser
>>> models (though they might want to operate on them), so that EXPLAIN and
>>> hints can work across optimisers (a suitable hint might essentially bypass
>>> the optimiser, if the optimiser permits it, by providing a standard
>>> execution model)
>>>
>>> I think it would be worth considering providing the execution plan to
>>> the client as part of query preparation, as an opaque payload to supply to
>>> coordinators on first contact, as this might simplify the problem of
>>> ensuring queries behave the same without adopting a lot of complexity for
>>> synchronising statistics (which will never provide strong guarantees). Of
>>> course, re-preparing a query might lead to a new plan, though any
>>> coordinators with the query in their cache should be able to retrieve it
>>> cheaply. If the execution model is efficiently serialised this might have
>>> the ancillary benefit of improving the occupancy of our prepared query
>>> cache.
>>>
>>>
>>> On 13 Dec 2023, at 00:44, Jon Haddad <j...@jonhaddad.com> wrote:
>>>
>>> 
>>> I think it makes sense to see what the actual overhead is of CBO before
>>> making the assumption it'll be so high that we need to have two code
>>> paths.  I'm happy to provide thorough benchmarking and analysis when it
>>> reaches a testing phase.
>>>
>>> I'm excited to see where this goes.  I think it sounds very forward
>>> looking and opens up a lot of possibilities.
>>>
>>> Jon
>>>
>>> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell <cclive1...@gmail.com>
>>> wrote:
>>>
>>> Nothing expresses my thoughts better than +1
>>> ，It feels like it means a lot to Cassandra.
>>>
>>> I have a question. Is it easy to turn off cbo's optimizer or by pass in
>>> some way? Because some simple read and write requests will have better
>>> performance without cbo, which is also the advantage of Cassandra compared
>>> to some rdbms.
>>>
>>> David Capwell <dcapw...@apple.com>于2023年12月13日 周三上午3:37写道：
>>>
>>> Overall LGTM.
>>>
>>>
>>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer <ble...@apache.org> wrote:
>>>
>>> Hi everybody,
>>>
>>> I would like to open the discussion on the introduction of a cost based
>>> optimizer to allow Cassandra to pick the best execution plan based on the
>>> data distribution.Therefore, improving the overall query performance.
>>>
>>> This CEP should also lay the groundwork for the future addition of
>>> features like joins, subqueries, OR/NOT and index ordering.
>>>
>>> The proposal is here:
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
>>>
>>> Thank you in advance for your feedback.
>>>
>>>
>

Re: [DISCUSS] CEP-39: Cost Based Optimizer

Reply via email to