[
https://issues.apache.org/jira/browse/CASSANDRA-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634964#comment-17634964
]
Caleb Rackliffe commented on CASSANDRA-16052:
---------------------------------------------
Had a quick chat w/ [~mikea] earlier today, and we've come up w/ a phased plan
for getting the SAI components reviewed, into the feature branch, and
delivered. The following will all likely correspond to their own Jiras attached
to this epic:
*Phase 1 - Index API and Memtable Indexing*
The simplest component of SAI we can break off first, review, and test is the
Memtable-adjacent index and SAI's integration w/ the C* 2i API. When this phase
is complete, we should be able to create indexes on text and numeric data and
query those indexes while the base table data still resides in memory.
*Phase 2 - SSTable Indexing Tools and On-Disk Format for Text Indexing*
With the 2i integration and Memtable indexes working, we can introduce the
on-disk components that make SSTable indexing possible generally (index
building, result collation between Memtable indexes and SSTable indexes,
SSTable-level shared data for multiple indexed columns) and the first user of
those components, the disk-based trie that supports text indexing. When this
phase is complete, we should have an end-to-end solution for basic text
indexing.
*Phase 3 - On-Disk Format for Numeric Indexing*
With the general tools that support SSTable indexing complete in phase 2, we
can add the on-disk format for numeric indexing. With this phase complete,
we'll have end-to-end support for numeric equality and range queries.
*Phase 4 - Harry*
Whether we've already developed a model for testing generic indexing/filtering
or not, at the conclusion of phase 3, we'll want to figure out the best way for
Harry to exercise SAI. ([~ifesdjeen] and I have had some preliminary discussion
around this.) This is ordered after the first 3 phases in a gatekeeping sense,
but given SAI is just an indexing _implementation_, work on a Harry model could
happen before or concurrently with them.
*Phase 5 - LIKE Support and Statement Restriction Cleanup*
At the conclusion of phase 4, we should have a solid working version of SAI
that supports basic numeric and text indexing. However, we may still want to
build support for text prefix queries via the {{LIKE}} operator to get to rough
feature parity w/ SASI. (Whether we need suffix/contains/full text regex
support is more debatable.) Also, there are some superficial bits of cleanup we
may need to do in CQL space around when certain boolean queries in SAI (like in
SASI) require {{ALLOW FILTERING}} even when the query only restrictions on
indexed columns.
> CEP-7 Storage Attached Indexes
> ------------------------------
>
> Key: CASSANDRA-16052
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16052
> Project: Cassandra
> Issue Type: Epic
> Components: Feature/2i Index
> Reporter: Zhao Yang
> Assignee: Caleb Rackliffe
> Priority: Normal
> Fix For: 4.x
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> [CEP|https://docs.google.com/document/d/1V830eAMmQAspjJdjviVZIaSolVGvZ1hVsqOLWyV0DS4/edit#heading=h.67ap6rr1mxr]
> - A new index implementation, called Storage
> Attached Index(SAI), based on the advancement made by SASI.
> * disk usage by sharing of common data between multiple column indexes on
> the same table and better compression of on-disk structures.
> * numeric range query performance with modified KDTree and collection type
> support.
> * compaction performance and stability for larger data set.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]