Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

guo Maxwell Mon, 30 Dec 2024 19:13:08 -0800

If we can also build CBO capabilities for Cassandra in the future, it will
be of great help to Cassandra in expanding its ability to analyze big data.
If CEP-39 can be adopted, is it possible to divide the big task to some
sub-tasks and combine the power of the community to jointly advance this
matter?


But I wonder if the syntax for index selection can also be supported? I
don't know the background of SAI optimization, but I think the issue of
index selection is necessary and meaningful.
This is why I opened this issue. For example, MySQL supports CBO, but also
provides users with an entry to select/ignore indexes.


Benjamin Lerer <b.le...@gmail.com> 于2024年12月31日周二 00:36写道：

> I am personally -1 against the approach taken by DS for optimizing SAI
> queries. I consider it as a quick fix rather than a proper long term
> solution.
> The issue with this approach is that it splits the optimisation logic of
> queries. With one part of the optimization happening on the coordinator
> while the other is performed on the replicas. By doing so it will make the
> introduction of features such as global indexes much harder as those
> features will need to be plugged to the optimization logic.
> I strongly believe that the proper solution for query optimization is what
> was proposed in CEP-39: a cost based query optimizer located on the
> coordinator.
> Unfortunately do not have the time to work on it.
>
>
> Le lun. 23 déc. 2024 à 07:11, guo Maxwell <cclive1...@gmail.com> a écrit :
>
>> Wrong typing, it is DML not DDL ~~~😅
>>
>> guo Maxwell <cclive1...@gmail.com> 于2024年12月23日周一 11:56写道：
>>
>>> Thanks Caleb for bringing it up.
>>>
>>> When I opened this issue, I originally wanted to propose the following
>>> syntax:
>>> 1、When we want to force the selection of certain indexes:
>>> SELECT ...USE INDEX (ks.idx1, ks.idx2....)  ... FROM ...WHERE ...
>>>  or
>>> SELECT .../*+ USE INDEX (ks.idx1, ks.idx2....)*/  ... FROM ...WHERE ...
>>>
>>> 2、When we want to ignore the selection of some indexes:
>>> SELECT ...IGNORE INDEX (ks.idx1, ks.idx2....)  ... FROM ...  WHERE ...
>>> or
>>> SELECT .../*+ IGNORE INDEX (ks.idx1, ks.idx2....) */ ... FROM ...  WHERE
>>> ...
>>>
>>> And In fact, what I originally wanted to do was to provide a hint
>>> function
>>> <https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html#optimizer-hints-overview>
>>> similar to MySQL, which can push down some rules  dynamically.
>>> The work to provide a hint for DDL needs to be accomplished in many
>>> steps, CASSANDRA-18112 for index hint is part of it, maybe the first step.
>>>
>>>
>>> Ekaterina Dimitrova <e.dimitr...@gmail.com> 于2024年12月21日周六 22:27写道：
>>>
>>>> Naming is hard but to me providing what Caleb mentioned through
>>>> something like WITH OPTIONS sounds reasonable. Thanks for bringing it
>>>> up.
>>>>
>>>> On Sat, 21 Dec 2024 at 2:46, Joel Shepherd <sheph...@amazon.com> wrote:
>>>>
>>>>> WITH INDEX (or something equivalent) seems really useful.
>>>>>
>>>>> Less opinionated on the specific syntax, but I think there is a lot of
>>>>> value in the form of predictable, controllable performance, in giving
>>>>> developers more direct control over query execution, whether that's
>>>>> index selection or even lower-level decisions. If you've experienced
>>>>> the
>>>>> thrill of operating a database with a cost-based planner that abruptly
>>>>> selects a new, sub-optimal plan due to a change in statistics or
>>>>> configuration, you'll appreciate language features that yield some
>>>>> planning control back to you. It does increase the burden on the
>>>>> developer to understand how best to execute the query, but it makes
>>>>> their intent much more obvious, and easier to adjust as the system
>>>>> changes.
>>>>>
>>>>> -- Joel.
>>>>>
>>>>> On 12/20/2024 12:28 PM, Caleb Rackliffe wrote:
>>>>> > Some of your are probably familiar with work in the DS fork to
>>>>> improve
>>>>> > the selection of indexes for SAI queries in
>>>>> >
>>>>> https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424.
>>>>>
>>>>> >
>>>>> >
>>>>> > While I'm eagerly anticipating working on that in the new year, I'm
>>>>> > also wondering whether we think some simple CQL extensions to
>>>>> manually
>>>>> > control index selection would be helpful. Maxwell proposed this a
>>>>> > while back in CASSANDRA-18112, and I'd like to propose a syntax:
>>>>> >
>>>>> >
>>>>> > ex. Do not use the specified index during the query.
>>>>> >
>>>>> > SELECT ... FROM ... WHERE ... WITHOUT INDEX <ks.idx>
>>>>> >
>>>>> > This could be helpful for intersection queries where one of the
>>>>> > provided clauses is not very selective and could simply be handled
>>>>> via
>>>>> > post-filtering.
>>>>> >
>>>>> > ex. Require the specified index to be used.
>>>>> >
>>>>> > SELECT ... FROM ... WHERE ... WITH INDEX <ks.idx>
>>>>> >
>>>>> > This could be helpful in scenarios where multiple indexes exist on a
>>>>> > column and was the primary motivation for CASSANDRA-18112.
>>>>> >
>>>>> > Thoughts?
>>>>>
>>>>

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

Reply via email to