Did this get resolved? Is it ready for a VOTE thread? On Tue, Jan 2, 2024 at 1:41 PM Benedict <bened...@apache.org> wrote:
> The CEP expressly includes an item for coordinated cardinality estimation, > by producing whole cluster summaries. I’m not sure if you addressed this in > your feedback, it’s not clear what you’re referring to with distributed > estimates, but avoiding this was expressly the driver of my suggestion to > instead include the plan as a payload (which offers users some additional > facilities). > > > On 2 Jan 2024, at 21:26, Ariel Weisberg <ar...@weisberg.ws> wrote: > > > Hi, > > I am burying the lede, but it's important to keep an eye on > runtime-adaptive vs planning time optimization as the cost/benefits vary > greatly between the two and runtime adaptive can be a game changer. > Basically CBO optimizes for query efficiency and startup time at the > expense of not handling some queries well and runtime adaptive is > cheap/free for expensive queries and can handle cases that CBO can't. > > Generally speaking I am +1 on the introduction of a CBO, since it seems > like there exists things that would benefit from it materially (and many of > the associated refactors/cleanup) and it aligns with my north star that > includes joins. > > Do we all have the same north star that Cassandra should eventually > support joins? Just curious if that is controversial. > > I don't feel like this CEP in particular should need to really nail down > exactly how distributed estimates work since we can start with using local > estimates as a proxy for the entire cluster and then improve. If someone > has bandwidth to do a separate CEP for that then sure that would be great, > but this seems big enough in scope already. > > RE testing, continuity of performance of queries is going to be really > important. I would really like to see that we have a fuzzed the space > deterministically and via a collection of hand rolled cases, and can > compare performance between versions to catch queries that regress. > Hopefully we can agree on a baseline for releasing where we know what prior > release to compare to and what acceptable changes in performance are. > > RE prepared statements - It feels to me like trying to send the plan blob > back and forth to get more predictable, but not absolutely predictable, > plans is not worth it? Feels like a lot for an incremental improvement over > a baseline that doesn't exist yet, IOW it doesn't feel like something for > V1. Maybe it ends up in YAGNI territory. > > The north star of predictable behavior for queries is a *very* important > one because it means the world to users, but CBO is going to make mistakes > all over the place. It's simply unachievable even with accurate statistics > because it's very hard to tell how predicates will behave on a column. > > This segues nicely into the importance of adaptive execution :-) It's how > you rescue the queries that CBO doesn't handle well for any reason such as > bugs, bad statistics, missing features. Re-ordering predicate evaluation, > switching indexes, and re-ordering joins can all be done on the fly. > > CBO is really a performance optimization since adaptive approaches will > allow any query to complete with some wasted resources. > > If my pager were waking me up at night and I wanted to stem the bleeding I > would reach for runtime adaptive over CBO because I know it will catch more > cases even if it is slower to execute up front. > > What is the nature of the queries we are looking solve right now? Are they > long running heavy hitters, or short queries that explode if run > incorrectly, or a mix of both? > > Ariel > > On Tue, Dec 12, 2023, at 8:29 AM, Benjamin Lerer wrote: > > Hi everybody, > > I would like to open the discussion on the introduction of a cost based > optimizer to allow Cassandra to pick the best execution plan based on the > data distribution.Therefore, improving the overall query performance. > > This CEP should also lay the groundwork for the future addition of > features like joins, subqueries, OR/NOT and index ordering. > > The proposal is here: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer > > Thank you in advance for your feedback. > > >