> > So yes, this physical plan is the structure that you have in mind but > the idea of sharing it is not part of the CEP.
Sorry, Benedict, what I meant by sharing was sharing across the nodes. It is an integral part of the optimizer API that the CEP talks about as it represents its output. I did not realize that this part was not clear. Le jeu. 14 déc. 2023 à 17:15, Benedict <bened...@apache.org> a écrit : > Fwiw Chris, I agree with your concerns, but I think the introduction of a > CBO - done right - is in principle a good thing in its own right. It’s > independent of the issues you mention, even if it might enable features > that exacerbate them. > > It should also help enable secondary indexes work better, which is I think > their main justification. Otherwise there isn’t really a good reason for it > today. Though I personally anticipate ongoing issues around 2i that SAI is > perhaps over exuberantly sold as solving, and that a CBO will not fix. But > we’ll see how that evolves. > > > > On 14 Dec 2023, at 15:49, Chris Lohfink <clohfin...@gmail.com> wrote: > > > I don't wanna be a blocker for this CEP or anything but did want to put my > 2 cents in. This CEP is horrifying to me. > > I have seen thousands of clusters across multiple companies and helped > them get working successfully. A vast majority of that involved blocking > the use of MVs, GROUP BY, secondary indexes, and even just simple _range > queries_. The "unncessary restrictions of cql" are not only necessary IMHO, > more restrictions are necessary to be successful at scale. The idea of just > opening up CQL to general purpose relational queries and lines like > "supporting > queries with joins in an efficient way" ... I would really like us to > make secondary indexes be a viable option before we start opening up > floodgates on stuff like this. > > Chris > > On Thu, Dec 14, 2023 at 9:37 AM Benedict <bened...@apache.org> wrote: > >> > So yes, this physical plan is the structure that you have in mind but >> the idea of sharing it is not part of the CEP. >> >> >> I think it should be. This should form a major part of the API on which >> any CBO is built. >> >> >> > It seems that there is a difference between the goal of your proposal >> and the one of the CEP. The goal of the CEP is first to ensure optimal >> performance. It is ok to change the execution plan for one that delivers >> better performance. What we want to minimize is having a node performing >> queries in an inefficient way for a long period of time. >> >> >> You have made a goal of the CEP synchronising summary statistics across >> the whole cluster in order to achieve some degree of uniformity of query >> plan. So this is explicitly a goal of the CEP, and synchronising summary >> statistics is a hard problem and won’t provide strong guarantees. >> >> >> > The client side proposal targets consistency for a given query on a >> given driver instance. In practice, it would be possible to have 2 similar >> queries with 2 different execution plans on the same driver >> >> >> This would only be possible if the driver permitted it. A driver could >> (and should) enforce that it only permits one query plan per query. >> >> >> The opposite is true for your proposal: some queries may begin degrading >> because they touch specific replicas that optimise the query differently, >> and this will be hard to debug. >> >> >> On 14 Dec 2023, at 15:30, Benjamin Lerer <b.le...@gmail.com> wrote: >> >> >> The binding of the parser output to the schema (what is today the >> Raw.prepare call) will create the logical plan, expressed as a tree of >> relational operators. Simplification and normalization will happen on that >> tree to produce a new equivalent logical plan. That logical plan will be >> used as input to the optimizer. The output will be a physical plan >> producing the output specified by the logical plan. A tree of physical >> operators specifying how the operations should be performed. >> >> That physical plan will be stored as part of the statements >> (SelectStatement, ModificationStatement, ...) in the prepared statement >> cache. Upon execution, variables will be bound and the >> RangeCommands/Mutations will be created based on the physical plan. >> >> The string representation of a physical plan will effectively represent >> the output of an EXPLAIN statement but outside of that the physical plan >> will stay encapsulated within the statement classes. >> Hints will be parameters provided to the optimizer to enforce some >> specific choices. Like always using an Index Scan instead of a Table Scan, >> ignoring the cost comparison. >> >> So yes, this physical plan is the structure that you have in mind but the >> idea of sharing it is not part of the CEP. I did not document it because it >> will simply be a tree of physical operators used internally. >> >> My proposal is that the execution plan of the coordinator that prepares a >>> query gets serialised to the client, which then provides the execution plan >>> to all future coordinators, and coordinators provide it to replicas as >>> necessary. >>> >>> >>> This means it is not possible for any conflict to arise for a single >>> client. It would guarantee consistency of execution for any single client >>> (and avoid any drift over the client’s sessions), without necessarily >>> guaranteeing consistency for all clients. >>> >> >> It seems that there is a difference between the goal of your proposal >> and the one of the CEP. The goal of the CEP is first to ensure optimal >> performance. It is ok to change the execution plan for one that delivers >> better performance. What we want to minimize is having a node performing >> queries in an inefficient way for a long period of time. >> >> The client side proposal targets consistency for a given query on a given >> driver instance. In practice, it would be possible to have 2 similar >> queries with 2 different execution plans on the same driver making things >> really confusing. Identifying the source of an inefficient query will also >> be pretty hard. >> >> Interestingly, having 2 nodes with 2 different execution plans might not >> be a serious problem. It simply means that based on cardinality at t1, the >> optimizer on node 1 chose plan 1 while the one on node 2 chose plan 2 at >> t2. In practice if the cost estimates reflect properly the actual cost >> those 2 plans should have pretty similar efficiency. The problem is more >> about the fact that you would ideally want a uniform behavior around your >> cluster. >> Changes of execution plans should only occur at certain points. So the >> main problematic scenario is when the data distribution is around one of >> those points. Which is also the point where the change should have the >> least impact. >> >> >> Le jeu. 14 déc. 2023 à 11:38, Benedict <bened...@apache.org> a écrit : >> >>> There surely needs to be a more succinct and abstract representation in >>> order to perform transformations on the query plan? You don’t intend to >>> manipulate the object graph directly as you apply any transformations when >>> performing simplification or cost based analysis? This would also (I >>> expect) be the form used to support EXPLAIN functionality, and probably >>> also HINTs etc. This would ideally *not* be coupled to the CBO itself, >>> and would ideally be succinctly serialised. >>> >>> I would very much expect the query plan to be represented abstractly as >>> part of this work, and for there to be a mechanism that translates this >>> abstract representation into the object graph that executes it. >>> >>> If I’m incorrect, could you please elaborate more specifically how you >>> intend to go about this? >>> >>> On 14 Dec 2023, at 10:33, Benjamin Lerer <b.le...@gmail.com> wrote: >>> >>> >>> >>>> I mean that an important part of this work - not specified in the CEP >>>> (AFAICT) - should probably be to define some standard execution model, that >>>> we can manipulate and serialise, for use across (and without) optimisers. >>> >>> >>> I am confused because for me an execution model defines how operations >>> are executed within the database in a conceptual way, which is not >>> something that this CEP intends to change. Do you mean the >>> physical/execution plan? >>> Today this plan is somehow represented for reads by the SelectStatement >>> and its components (Selections, StatementRestrictions, ...) it is then >>> converted at execution time after parameter binding into a ReadCommand >>> which is sent to the replicas. >>> We plan to refactor SelectStatement and its components but the >>> ReadCommands change should be relatively small. What you are proposing is >>> not part of the scope of this CEP. >>> >>> Le jeu. 14 déc. 2023 à 10:24, Benjamin Lerer <b.le...@gmail.com> a >>> écrit : >>> >>>> Can you share the reasons why Apache Calcite is not suitable for this >>>>> case and why it was rejected >>>> >>>> >>>> My understanding is that Calcite was made for two main things: to help >>>> with optimizing SQL-like languages and to let people query different kinds >>>> of data sources together. >>>> >>>> We could think about using it for our needs, but there are some big >>>> problems: >>>> >>>> 1. >>>> >>>> CQL is not SQL. There are significant differences between the 2 >>>> languages >>>> 2. >>>> >>>> Cassandra has its own specificities that will influence the cost >>>> model and the way we deal with optimizations: partitions, replication >>>> factors, consistency levels, LSM tree storage, ... >>>> 3. >>>> >>>> Every framework comes with its own limitations and additional cost >>>> >>>> From my view, there are too many big differences between what Calcite >>>> does and what we need in Cassandra. If we used Calcite, it would also mean >>>> relying a lot on another system that everyone would have to learn and >>>> adjust to. The problems and extra work this would bring don't seem worth >>>> the benefits we might get >>>> >>>> >>>> Le mer. 13 déc. 2023 à 18:06, Benjamin Lerer <b.le...@gmail.com> a >>>> écrit : >>>> >>>>> One thing that I did not mention is the fact that this CEP is only a >>>>> high level proposal. There will be deeper discussions on the dev list >>>>> around the different parts of this proposal when we reach those parts and >>>>> have enough details to make those discussions more meaningful. >>>>> >>>>> >>>>>> The maintenance and distribution of summary statistics in particular >>>>>> is worthy of its own CEP, and it might be preferable to split it out. >>>>> >>>>> >>>>> For maintaining node statistics the idea is to re-use the current >>>>> Memtable/SSTable mechanism and relies on mergeable statistics. That will >>>>> allow us to easily build node level statistics for a given table by >>>>> merging >>>>> all the statistics of its memtable and SSTables. For the distribution of >>>>> these node statistics we are still exploring different options. We can >>>>> come >>>>> back with a precise proposal once we have hammered all the details. >>>>> Is it for you a blocker for this CEP or do you just want to make sure >>>>> that this part is discussed in deeper details before we implement it? >>>>> >>>>>> >>>>>> The proposal also seems to imply we are aiming for coordinators to >>>>>> all make the same decision for a query, which I think is challenging, and >>>>>> it would be worth fleshing out the design here a little (perhaps just in >>>>>> Jira). >>>>> >>>>> >>>>> The goal is that the large majority of nodes preparing a query at a >>>>> given point in time should make the same decision and that over time all >>>>> nodes should converge toward the same decision. This part is dependent on >>>>> the node statistics distribution, the cost model and the triggers for >>>>> re-optimization (that will require some experimentation). >>>>> >>>>> There’s also not much discussion of the execution model: I think it >>>>>> would make most sense for this to be independent of any cost and >>>>>> optimiser >>>>>> models (though they might want to operate on them), so that EXPLAIN and >>>>>> hints can work across optimisers (a suitable hint might essentially >>>>>> bypass >>>>>> the optimiser, if the optimiser permits it, by providing a standard >>>>>> execution model) >>>>>> >>>>> >>>>> It is not clear to me what you mean by "a standard execution model"? >>>>> Otherwise, we were not planning to have the execution model or the hints >>>>> depending on the optimizer. >>>>> >>>>> I think it would be worth considering providing the execution plan to >>>>>> the client as part of query preparation, as an opaque payload to supply >>>>>> to >>>>>> coordinators on first contact, as this might simplify the problem of >>>>>> ensuring queries behave the same without adopting a lot of complexity for >>>>>> synchronising statistics (which will never provide strong guarantees). Of >>>>>> course, re-preparing a query might lead to a new plan, though any >>>>>> coordinators with the query in their cache should be able to retrieve it >>>>>> cheaply. If the execution model is efficiently serialised this might have >>>>>> the ancillary benefit of improving the occupancy of our prepared query >>>>>> cache. >>>>>> >>>>> >>>>> I am not sure that I understand your proposal. If 2 nodes build a >>>>> different execution plan how do you solve that conflict? >>>>> >>>>> Le mer. 13 déc. 2023 à 09:55, Benedict <bened...@apache.org> a écrit : >>>>> >>>>>> A CBO can only make worse decisions than the status quo for what I >>>>>> presume are the majority of queries - i.e. those that touch only primary >>>>>> indexes. In general, there are plenty of use cases that prefer >>>>>> determinism. >>>>>> So I agree that there should at least be a CBO implementation that makes >>>>>> the same decisions as the status quo, deterministically. >>>>>> >>>>>> >>>>>> I do support the proposal, but would like to see some elements >>>>>> discussed in more detail. The maintenance and distribution of summary >>>>>> statistics in particular is worthy of its own CEP, and it might be >>>>>> preferable to split it out. The proposal also seems to imply we are >>>>>> aiming >>>>>> for coordinators to all make the same decision for a query, which I think >>>>>> is challenging, and it would be worth fleshing out the design here a >>>>>> little >>>>>> (perhaps just in Jira). >>>>>> >>>>>> >>>>>> While I’m not a fan of ALLOW FILTERING, I’m not convinced that this >>>>>> CEP deprecates it. It is a concrete qualitative guard rail, that I expect >>>>>> some users will prefer to a cost-based guard rail. Perhaps this could be >>>>>> left to the CBO to decide how to treat. >>>>>> >>>>>> >>>>>> There’s also not much discussion of the execution model: I think it >>>>>> would make most sense for this to be independent of any cost and >>>>>> optimiser >>>>>> models (though they might want to operate on them), so that EXPLAIN and >>>>>> hints can work across optimisers (a suitable hint might essentially >>>>>> bypass >>>>>> the optimiser, if the optimiser permits it, by providing a standard >>>>>> execution model) >>>>>> >>>>>> >>>>>> I think it would be worth considering providing the execution plan to >>>>>> the client as part of query preparation, as an opaque payload to supply >>>>>> to >>>>>> coordinators on first contact, as this might simplify the problem of >>>>>> ensuring queries behave the same without adopting a lot of complexity for >>>>>> synchronising statistics (which will never provide strong guarantees). Of >>>>>> course, re-preparing a query might lead to a new plan, though any >>>>>> coordinators with the query in their cache should be able to retrieve it >>>>>> cheaply. If the execution model is efficiently serialised this might have >>>>>> the ancillary benefit of improving the occupancy of our prepared query >>>>>> cache. >>>>>> >>>>>> On 13 Dec 2023, at 00:44, Jon Haddad <j...@jonhaddad.com> wrote: >>>>>> >>>>>> >>>>>> I think it makes sense to see what the actual overhead is of CBO >>>>>> before making the assumption it'll be so high that we need to have two >>>>>> code >>>>>> paths. I'm happy to provide thorough benchmarking and analysis when it >>>>>> reaches a testing phase. >>>>>> >>>>>> I'm excited to see where this goes. I think it sounds very forward >>>>>> looking and opens up a lot of possibilities. >>>>>> >>>>>> Jon >>>>>> >>>>>> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell <cclive1...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Nothing expresses my thoughts better than +1 >>>>>>> ,It feels like it means a lot to Cassandra. >>>>>>> >>>>>>> I have a question. Is it easy to turn off cbo's optimizer or by pass >>>>>>> in some way? Because some simple read and write requests will have >>>>>>> better >>>>>>> performance without cbo, which is also the advantage of Cassandra >>>>>>> compared >>>>>>> to some rdbms. >>>>>>> >>>>>>> >>>>>>> David Capwell <dcapw...@apple.com>于2023年12月13日 周三上午3:37写道: >>>>>>> >>>>>>>> Overall LGTM. >>>>>>>> >>>>>>>> >>>>>>>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer <ble...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi everybody, >>>>>>>> >>>>>>>> I would like to open the discussion on the introduction of a cost >>>>>>>> based optimizer to allow Cassandra to pick the best execution plan >>>>>>>> based on >>>>>>>> the data distribution.Therefore, improving the overall query >>>>>>>> performance. >>>>>>>> >>>>>>>> This CEP should also lay the groundwork for the future addition of >>>>>>>> features like joins, subqueries, OR/NOT and index ordering. >>>>>>>> >>>>>>>> The proposal is here: >>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer >>>>>>>> >>>>>>>> Thank you in advance for your feedback. >>>>>>>> >>>>>>>> >>>>>>>>