Thanks for time in addressing concerns. At least with initial versions, as long as there is a way to replace it with noop or disable it I would be happy. This is pretty standard practice with features nowadays but I wanted to highlight it as this might require some pretty tight coupling.
Chris On Fri, Dec 15, 2023 at 7:57 AM Benjamin Lerer <ble...@apache.org> wrote: > Hey Chris, > You raise some valid points. > > I believe that there are 3 points that you mentioned: > 1) CQL restrictions are some form of safety net and should be kept > 2) A lot of Cassandra features do not scale and/or are too easy to use in > a wrong way that can make the whole system collapse. We should not add more > to that list. Especially not joins. > 3) Should we not start to fix features like secondary index rather than > adding new ones? Which is heavily linked to 2). > > Feel free to correct me if I got them wrong or missed one. > > Regarding 1), I believe that you refer to the "Removing unnecessary CQL > query limitations and inconsistencies" section. We are not planning to > remove any safety net here. > What we want to remove is a certain amount of limitations which make > things confusing for a user trying to write a query for no good reason. > Like "why can I define a column alias but not use it anywhere in my query?" > or "Why can I not create a list with 2 bind parameters?". While refactoring > some CQL code, I kept on finding those types of exceptions that we can > easily remove while simplifying the code at the same time. > > For 2), I agree that at a certain scale or for some scenarios, some > features simply do not scale or catch users by surprise. The goal of the > CEP is to improve things in 2 ways. One is by making Cassandra smarter in > the way it chooses how to process queries, hopefully improving its overall > scalability. The other by being transparent about how Cassandra will > execute the queries through the use of EXPLAIN. One problem of GROUP BY for > example is that most users do not realize what is actually happening under > the hood and therefore its limitations. I do not believe that EXPLAIN will > change everything but it will help people to get a better understanding of > the limitations of some features. > > I do not know which features will be added in the future to C*. That will > be discussed through some future CEPs. Nevertheless, I do not believe that > it makes sense to write a CEP for a query optimizer without taking into > account that we might at some point add some level of support for joins or > subqueries. We have been too often delivering features without looking at > what could be the possible evolutions which resulted in code where adding > new features was more complex than it should have been. I do not want to > make the same mistake. I want to create an optimizer that can be improved > easily and considering joins or other features simply help to build things > in a more generic way. > > Regarding feature stabilization, I believe that it is happening. I have > heard plans of how to solve MVs, range queries, hot partitions, ... and > there was a lot of thinking behind those plans. Secondary indexes are being > worked on. We hope that the optimizer will also help with some index > queries. > > It seems to me that this proposal is going toward the direction that you > want without introducing new problems for scalability. > > > Le jeu. 14 déc. 2023 à 16:47, Chris Lohfink <clohfin...@gmail.com> a > écrit : > >> I don't wanna be a blocker for this CEP or anything but did want to put >> my 2 cents in. This CEP is horrifying to me. >> >> I have seen thousands of clusters across multiple companies and helped >> them get working successfully. A vast majority of that involved blocking >> the use of MVs, GROUP BY, secondary indexes, and even just simple _range >> queries_. The "unncessary restrictions of cql" are not only necessary IMHO, >> more restrictions are necessary to be successful at scale. The idea of just >> opening up CQL to general purpose relational queries and lines like >> "supporting >> queries with joins in an efficient way" ... I would really like us to >> make secondary indexes be a viable option before we start opening up >> floodgates on stuff like this. >> >> Chris >> >> On Thu, Dec 14, 2023 at 9:37 AM Benedict <bened...@apache.org> wrote: >> >>> > So yes, this physical plan is the structure that you have in mind but >>> the idea of sharing it is not part of the CEP. >>> >>> >>> I think it should be. This should form a major part of the API on which >>> any CBO is built. >>> >>> >>> > It seems that there is a difference between the goal of your proposal >>> and the one of the CEP. The goal of the CEP is first to ensure optimal >>> performance. It is ok to change the execution plan for one that delivers >>> better performance. What we want to minimize is having a node performing >>> queries in an inefficient way for a long period of time. >>> >>> >>> You have made a goal of the CEP synchronising summary statistics across >>> the whole cluster in order to achieve some degree of uniformity of query >>> plan. So this is explicitly a goal of the CEP, and synchronising summary >>> statistics is a hard problem and won’t provide strong guarantees. >>> >>> >>> > The client side proposal targets consistency for a given query on a >>> given driver instance. In practice, it would be possible to have 2 similar >>> queries with 2 different execution plans on the same driver >>> >>> >>> This would only be possible if the driver permitted it. A driver could >>> (and should) enforce that it only permits one query plan per query. >>> >>> >>> The opposite is true for your proposal: some queries may begin degrading >>> because they touch specific replicas that optimise the query differently, >>> and this will be hard to debug. >>> >>> >>> On 14 Dec 2023, at 15:30, Benjamin Lerer <b.le...@gmail.com> wrote: >>> >>> >>> The binding of the parser output to the schema (what is today the >>> Raw.prepare call) will create the logical plan, expressed as a tree of >>> relational operators. Simplification and normalization will happen on that >>> tree to produce a new equivalent logical plan. That logical plan will be >>> used as input to the optimizer. The output will be a physical plan >>> producing the output specified by the logical plan. A tree of physical >>> operators specifying how the operations should be performed. >>> >>> That physical plan will be stored as part of the statements >>> (SelectStatement, ModificationStatement, ...) in the prepared statement >>> cache. Upon execution, variables will be bound and the >>> RangeCommands/Mutations will be created based on the physical plan. >>> >>> The string representation of a physical plan will effectively represent >>> the output of an EXPLAIN statement but outside of that the physical plan >>> will stay encapsulated within the statement classes. >>> Hints will be parameters provided to the optimizer to enforce some >>> specific choices. Like always using an Index Scan instead of a Table Scan, >>> ignoring the cost comparison. >>> >>> So yes, this physical plan is the structure that you have in mind but >>> the idea of sharing it is not part of the CEP. I did not document it >>> because it will simply be a tree of physical operators used internally. >>> >>> My proposal is that the execution plan of the coordinator that prepares >>>> a query gets serialised to the client, which then provides the execution >>>> plan to all future coordinators, and coordinators provide it to replicas as >>>> necessary. >>>> >>>> >>>> This means it is not possible for any conflict to arise for a single >>>> client. It would guarantee consistency of execution for any single client >>>> (and avoid any drift over the client’s sessions), without necessarily >>>> guaranteeing consistency for all clients. >>>> >>> >>> It seems that there is a difference between the goal of your proposal >>> and the one of the CEP. The goal of the CEP is first to ensure optimal >>> performance. It is ok to change the execution plan for one that delivers >>> better performance. What we want to minimize is having a node performing >>> queries in an inefficient way for a long period of time. >>> >>> The client side proposal targets consistency for a given query on a >>> given driver instance. In practice, it would be possible to have 2 similar >>> queries with 2 different execution plans on the same driver making things >>> really confusing. Identifying the source of an inefficient query will also >>> be pretty hard. >>> >>> Interestingly, having 2 nodes with 2 different execution plans might not >>> be a serious problem. It simply means that based on cardinality at t1, the >>> optimizer on node 1 chose plan 1 while the one on node 2 chose plan 2 at >>> t2. In practice if the cost estimates reflect properly the actual cost >>> those 2 plans should have pretty similar efficiency. The problem is more >>> about the fact that you would ideally want a uniform behavior around your >>> cluster. >>> Changes of execution plans should only occur at certain points. So the >>> main problematic scenario is when the data distribution is around one of >>> those points. Which is also the point where the change should have the >>> least impact. >>> >>> >>> Le jeu. 14 déc. 2023 à 11:38, Benedict <bened...@apache.org> a écrit : >>> >>>> There surely needs to be a more succinct and abstract representation in >>>> order to perform transformations on the query plan? You don’t intend to >>>> manipulate the object graph directly as you apply any transformations when >>>> performing simplification or cost based analysis? This would also (I >>>> expect) be the form used to support EXPLAIN functionality, and probably >>>> also HINTs etc. This would ideally *not* be coupled to the CBO itself, >>>> and would ideally be succinctly serialised. >>>> >>>> I would very much expect the query plan to be represented abstractly as >>>> part of this work, and for there to be a mechanism that translates this >>>> abstract representation into the object graph that executes it. >>>> >>>> If I’m incorrect, could you please elaborate more specifically how you >>>> intend to go about this? >>>> >>>> On 14 Dec 2023, at 10:33, Benjamin Lerer <b.le...@gmail.com> wrote: >>>> >>>> >>>> >>>>> I mean that an important part of this work - not specified in the CEP >>>>> (AFAICT) - should probably be to define some standard execution model, >>>>> that >>>>> we can manipulate and serialise, for use across (and without) optimisers. >>>> >>>> >>>> I am confused because for me an execution model defines how operations >>>> are executed within the database in a conceptual way, which is not >>>> something that this CEP intends to change. Do you mean the >>>> physical/execution plan? >>>> Today this plan is somehow represented for reads by the SelectStatement >>>> and its components (Selections, StatementRestrictions, ...) it is then >>>> converted at execution time after parameter binding into a ReadCommand >>>> which is sent to the replicas. >>>> We plan to refactor SelectStatement and its components but the >>>> ReadCommands change should be relatively small. What you are proposing is >>>> not part of the scope of this CEP. >>>> >>>> Le jeu. 14 déc. 2023 à 10:24, Benjamin Lerer <b.le...@gmail.com> a >>>> écrit : >>>> >>>>> Can you share the reasons why Apache Calcite is not suitable for this >>>>>> case and why it was rejected >>>>> >>>>> >>>>> My understanding is that Calcite was made for two main things: to help >>>>> with optimizing SQL-like languages and to let people query different kinds >>>>> of data sources together. >>>>> >>>>> We could think about using it for our needs, but there are some big >>>>> problems: >>>>> >>>>> 1. >>>>> >>>>> CQL is not SQL. There are significant differences between the 2 >>>>> languages >>>>> 2. >>>>> >>>>> Cassandra has its own specificities that will influence the cost >>>>> model and the way we deal with optimizations: partitions, replication >>>>> factors, consistency levels, LSM tree storage, ... >>>>> 3. >>>>> >>>>> Every framework comes with its own limitations and additional cost >>>>> >>>>> From my view, there are too many big differences between what Calcite >>>>> does and what we need in Cassandra. If we used Calcite, it would also mean >>>>> relying a lot on another system that everyone would have to learn and >>>>> adjust to. The problems and extra work this would bring don't seem worth >>>>> the benefits we might get >>>>> >>>>> >>>>> Le mer. 13 déc. 2023 à 18:06, Benjamin Lerer <b.le...@gmail.com> a >>>>> écrit : >>>>> >>>>>> One thing that I did not mention is the fact that this CEP is only a >>>>>> high level proposal. There will be deeper discussions on the dev list >>>>>> around the different parts of this proposal when we reach those parts and >>>>>> have enough details to make those discussions more meaningful. >>>>>> >>>>>> >>>>>>> The maintenance and distribution of summary statistics in particular >>>>>>> is worthy of its own CEP, and it might be preferable to split it out. >>>>>> >>>>>> >>>>>> For maintaining node statistics the idea is to re-use the current >>>>>> Memtable/SSTable mechanism and relies on mergeable statistics. That will >>>>>> allow us to easily build node level statistics for a given table by >>>>>> merging >>>>>> all the statistics of its memtable and SSTables. For the distribution of >>>>>> these node statistics we are still exploring different options. We can >>>>>> come >>>>>> back with a precise proposal once we have hammered all the details. >>>>>> Is it for you a blocker for this CEP or do you just want to make sure >>>>>> that this part is discussed in deeper details before we implement it? >>>>>> >>>>>>> >>>>>>> The proposal also seems to imply we are aiming for coordinators to >>>>>>> all make the same decision for a query, which I think is challenging, >>>>>>> and >>>>>>> it would be worth fleshing out the design here a little (perhaps just in >>>>>>> Jira). >>>>>> >>>>>> >>>>>> The goal is that the large majority of nodes preparing a query at a >>>>>> given point in time should make the same decision and that over time all >>>>>> nodes should converge toward the same decision. This part is dependent on >>>>>> the node statistics distribution, the cost model and the triggers for >>>>>> re-optimization (that will require some experimentation). >>>>>> >>>>>> There’s also not much discussion of the execution model: I think it >>>>>>> would make most sense for this to be independent of any cost and >>>>>>> optimiser >>>>>>> models (though they might want to operate on them), so that EXPLAIN and >>>>>>> hints can work across optimisers (a suitable hint might essentially >>>>>>> bypass >>>>>>> the optimiser, if the optimiser permits it, by providing a standard >>>>>>> execution model) >>>>>>> >>>>>> >>>>>> It is not clear to me what you mean by "a standard execution model"? >>>>>> Otherwise, we were not planning to have the execution model or the hints >>>>>> depending on the optimizer. >>>>>> >>>>>> I think it would be worth considering providing the execution plan to >>>>>>> the client as part of query preparation, as an opaque payload to supply >>>>>>> to >>>>>>> coordinators on first contact, as this might simplify the problem of >>>>>>> ensuring queries behave the same without adopting a lot of complexity >>>>>>> for >>>>>>> synchronising statistics (which will never provide strong guarantees). >>>>>>> Of >>>>>>> course, re-preparing a query might lead to a new plan, though any >>>>>>> coordinators with the query in their cache should be able to retrieve it >>>>>>> cheaply. If the execution model is efficiently serialised this might >>>>>>> have >>>>>>> the ancillary benefit of improving the occupancy of our prepared query >>>>>>> cache. >>>>>>> >>>>>> >>>>>> I am not sure that I understand your proposal. If 2 nodes build a >>>>>> different execution plan how do you solve that conflict? >>>>>> >>>>>> Le mer. 13 déc. 2023 à 09:55, Benedict <bened...@apache.org> a >>>>>> écrit : >>>>>> >>>>>>> A CBO can only make worse decisions than the status quo for what I >>>>>>> presume are the majority of queries - i.e. those that touch only primary >>>>>>> indexes. In general, there are plenty of use cases that prefer >>>>>>> determinism. >>>>>>> So I agree that there should at least be a CBO implementation that makes >>>>>>> the same decisions as the status quo, deterministically. >>>>>>> >>>>>>> >>>>>>> I do support the proposal, but would like to see some elements >>>>>>> discussed in more detail. The maintenance and distribution of summary >>>>>>> statistics in particular is worthy of its own CEP, and it might be >>>>>>> preferable to split it out. The proposal also seems to imply we are >>>>>>> aiming >>>>>>> for coordinators to all make the same decision for a query, which I >>>>>>> think >>>>>>> is challenging, and it would be worth fleshing out the design here a >>>>>>> little >>>>>>> (perhaps just in Jira). >>>>>>> >>>>>>> >>>>>>> While I’m not a fan of ALLOW FILTERING, I’m not convinced that this >>>>>>> CEP deprecates it. It is a concrete qualitative guard rail, that I >>>>>>> expect >>>>>>> some users will prefer to a cost-based guard rail. Perhaps this could be >>>>>>> left to the CBO to decide how to treat. >>>>>>> >>>>>>> >>>>>>> There’s also not much discussion of the execution model: I think it >>>>>>> would make most sense for this to be independent of any cost and >>>>>>> optimiser >>>>>>> models (though they might want to operate on them), so that EXPLAIN and >>>>>>> hints can work across optimisers (a suitable hint might essentially >>>>>>> bypass >>>>>>> the optimiser, if the optimiser permits it, by providing a standard >>>>>>> execution model) >>>>>>> >>>>>>> >>>>>>> I think it would be worth considering providing the execution plan >>>>>>> to the client as part of query preparation, as an opaque payload to >>>>>>> supply >>>>>>> to coordinators on first contact, as this might simplify the problem of >>>>>>> ensuring queries behave the same without adopting a lot of complexity >>>>>>> for >>>>>>> synchronising statistics (which will never provide strong guarantees). >>>>>>> Of >>>>>>> course, re-preparing a query might lead to a new plan, though any >>>>>>> coordinators with the query in their cache should be able to retrieve it >>>>>>> cheaply. If the execution model is efficiently serialised this might >>>>>>> have >>>>>>> the ancillary benefit of improving the occupancy of our prepared query >>>>>>> cache. >>>>>>> >>>>>>> On 13 Dec 2023, at 00:44, Jon Haddad <j...@jonhaddad.com> wrote: >>>>>>> >>>>>>> >>>>>>> I think it makes sense to see what the actual overhead is of CBO >>>>>>> before making the assumption it'll be so high that we need to have two >>>>>>> code >>>>>>> paths. I'm happy to provide thorough benchmarking and analysis when it >>>>>>> reaches a testing phase. >>>>>>> >>>>>>> I'm excited to see where this goes. I think it sounds very forward >>>>>>> looking and opens up a lot of possibilities. >>>>>>> >>>>>>> Jon >>>>>>> >>>>>>> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell <cclive1...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Nothing expresses my thoughts better than +1 >>>>>>>> ,It feels like it means a lot to Cassandra. >>>>>>>> >>>>>>>> I have a question. Is it easy to turn off cbo's optimizer or by >>>>>>>> pass in some way? Because some simple read and write requests will have >>>>>>>> better performance without cbo, which is also the advantage of >>>>>>>> Cassandra >>>>>>>> compared to some rdbms. >>>>>>>> >>>>>>>> >>>>>>>> David Capwell <dcapw...@apple.com>于2023年12月13日 周三上午3:37写道: >>>>>>>> >>>>>>>>> Overall LGTM. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer <ble...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi everybody, >>>>>>>>> >>>>>>>>> I would like to open the discussion on the introduction of a cost >>>>>>>>> based optimizer to allow Cassandra to pick the best execution plan >>>>>>>>> based on >>>>>>>>> the data distribution.Therefore, improving the overall query >>>>>>>>> performance. >>>>>>>>> >>>>>>>>> This CEP should also lay the groundwork for the future addition >>>>>>>>> of features like joins, subqueries, OR/NOT and index ordering. >>>>>>>>> >>>>>>>>> The proposal is here: >>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer >>>>>>>>> >>>>>>>>> Thank you in advance for your feedback. >>>>>>>>> >>>>>>>>> >>>>>>>>>