[jira] [Commented] (CASSANDRA-16052) CEP-7 Storage Attached Indexes

Caleb Rackliffe (Jira) Wed, 16 Nov 2022 10:27:05 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634964#comment-17634964
 ]


Caleb Rackliffe commented on CASSANDRA-16052:
---------------------------------------------

Had a quick chat w/ [~mikea] earlier today, and we've come up w/ a phased plan 
for getting the SAI components reviewed, into the feature branch, and 
delivered. The following will all likely correspond to their own Jiras attached 
to this epic:

*Phase 1 - Index API and Memtable Indexing*

The simplest component of SAI we can break off first, review, and test is the 
Memtable-adjacent index and SAI's integration w/ the C* 2i API. When this phase 
is complete, we should be able to create indexes on text and numeric data and 
query those indexes while the base table data still resides in memory.

*Phase 2 - SSTable Indexing Tools and On-Disk Format for Text Indexing*

With the 2i integration and Memtable indexes working, we can introduce the 
on-disk components that make SSTable indexing possible generally (index 
building, result collation between Memtable indexes and SSTable indexes, 
SSTable-level shared data for multiple indexed columns) and the first user of 
those components, the disk-based trie that supports text indexing. When this 
phase is complete, we should have an end-to-end solution for basic text 
indexing.

*Phase 3 - On-Disk Format for Numeric Indexing*

With the general tools that support SSTable indexing complete in phase 2, we 
can add the on-disk format for numeric indexing. With this phase complete, 
we'll have end-to-end support for numeric equality and range queries.

*Phase 4 - Harry*

Whether we've already developed a model for testing generic indexing/filtering 
or not, at the conclusion of phase 3, we'll want to figure out the best way for 
Harry to exercise SAI. ([~ifesdjeen] and I have had some preliminary discussion 
around this.) This is ordered after the first 3 phases in a gatekeeping sense, 
but given SAI is just an indexing _implementation_, work on a Harry model could 
happen before or concurrently with them.

*Phase 5 - LIKE Support and Statement Restriction Cleanup*

At the conclusion of phase 4, we should have a solid working version of SAI 
that supports basic numeric and text indexing. However, we may still want to 
build support for text prefix queries via the {{LIKE}} operator to get to rough 
feature parity w/ SASI. (Whether we need suffix/contains/full text regex 
support is more debatable.) Also, there are some superficial bits of cleanup we 
may need to do in CQL space around when certain boolean queries in SAI (like in 
SASI) require {{ALLOW FILTERING}} even when the query only restrictions on 
indexed columns.

> CEP-7 Storage Attached Indexes
> ------------------------------
>
>                 Key: CASSANDRA-16052
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16052
>             Project: Cassandra
>          Issue Type: Epic
>          Components: Feature/2i Index
>            Reporter: Zhao Yang
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 4.x
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> [CEP|https://docs.google.com/document/d/1V830eAMmQAspjJdjviVZIaSolVGvZ1hVsqOLWyV0DS4/edit#heading=h.67ap6rr1mxr]
>  - A new index implementation, called Storage
>  Attached Index(SAI), based on the advancement made by SASI.
>  * disk usage by sharing of common data between multiple column indexes on 
> the same table and better compression of on-disk structures.
>  * numeric range query performance with modified KDTree and collection type 
> support.
>  * compaction performance and stability for larger data set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-16052) CEP-7 Storage Attached Indexes

Reply via email to