Did this get resolved? Is it ready for a VOTE thread?

On Tue, Jan 2, 2024 at 1:41 PM Benedict <bened...@apache.org> wrote:

> The CEP expressly includes an item for coordinated cardinality estimation,
> by producing whole cluster summaries. I’m not sure if you addressed this in
> your feedback, it’s not clear what you’re referring to with distributed
> estimates, but avoiding this was expressly the driver of my suggestion to
> instead include the plan as a payload (which offers users some additional
> facilities).
>
>
> On 2 Jan 2024, at 21:26, Ariel Weisberg <ar...@weisberg.ws> wrote:
>
> 
> Hi,
>
> I am burying the lede, but it's important to keep an eye on
> runtime-adaptive vs planning time optimization as the cost/benefits vary
> greatly between the two and runtime adaptive can be a game changer.
> Basically CBO optimizes for query efficiency and startup time at the
> expense of not handling some queries well and runtime adaptive is
> cheap/free for expensive queries and can handle cases that CBO can't.
>
> Generally speaking I am +1 on the introduction of a CBO, since it seems
> like there exists things that would benefit from it materially (and many of
> the associated refactors/cleanup) and it aligns with my north star that
> includes joins.
>
> Do we all have the same north star that Cassandra should eventually
> support joins? Just curious if that is controversial.
>
> I don't feel like this CEP in particular should need to really nail down
> exactly how distributed estimates work since we can start with using local
> estimates as a proxy for the entire cluster and then improve. If someone
> has bandwidth to do a separate CEP for that then sure that would be great,
> but this seems big enough in scope already.
>
> RE testing, continuity of performance of queries is going to be really
> important. I would really like to see that we have a fuzzed the space
> deterministically and via a collection of hand rolled cases, and can
> compare performance between versions to catch queries that regress.
> Hopefully we can agree on a baseline for releasing where we know what prior
> release to compare to and what acceptable changes in performance are.
>
> RE prepared statements - It feels to me like trying to send the plan blob
> back and forth to get more predictable, but not absolutely predictable,
> plans is not worth it? Feels like a lot for an incremental improvement over
> a baseline that doesn't exist yet, IOW it doesn't feel like something for
> V1. Maybe it ends up in YAGNI territory.
>
> The north star of predictable behavior for queries is a *very* important
> one because it means the world to users, but CBO is going to make mistakes
> all over the place. It's simply unachievable even with accurate statistics
> because it's very hard to tell how predicates will behave on a column.
>
> This segues nicely into the importance of adaptive execution :-) It's how
> you rescue the queries that CBO doesn't handle  well for any reason such as
> bugs, bad statistics, missing features. Re-ordering predicate evaluation,
> switching indexes, and re-ordering joins can all be done on the fly.
>
> CBO is really a performance optimization since adaptive approaches will
> allow any query to complete with some wasted resources.
>
> If my pager were waking me up at night and I wanted to stem the bleeding I
> would reach for runtime adaptive over CBO because I know it will catch more
> cases even if it is slower to execute up front.
>
> What is the nature of the queries we are looking solve right now? Are they
> long running heavy hitters, or short queries that explode if run
> incorrectly, or a mix of both?
>
> Ariel
>
> On Tue, Dec 12, 2023, at 8:29 AM, Benjamin Lerer wrote:
>
> Hi everybody,
>
> I would like to open the discussion on the introduction of a cost based
> optimizer to allow Cassandra to pick the best execution plan based on the
> data distribution.Therefore, improving the overall query performance.
>
> This CEP should also lay the groundwork for the future addition of
> features like joins, subqueries, OR/NOT and index ordering.
>
> The proposal is here:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
>
> Thank you in advance for your feedback.
>
>
>

Reply via email to