HNSW can in principle be made into a distributed index. But that would be quite a different paradigm to SAI.

On 9 May 2023, at 19:30, Patrick McFadin <pmcfa...@gmail.com> wrote:


Under the goals section, there is this line:

  1. Scatter/gather across replicas, combining topK from each to get global topK.

But what I'm hearing is, exactly how will that happen? Maybe this is an SAI question too. How is that verified in SAI?

On Tue, May 9, 2023 at 11:07 AM David Capwell <dcapw...@apple.com> wrote:
Approach section doesn’t go over how this will handle cross replica search, this would be good to flesh out… given results have a real ranking, the current 2i logic may yield incorrect results… so would think we need num_ranges / rf queries in the best case, with some new capability to sort the results?  If my assumption is correct, then how errors are handled should also be fleshed out… Example: 1k cluster without vnode and RF=3, so 333 queries fanned out to match, then coordinator needs to sort… if 1 of the queries fails and can’t fall back to peers… does the query fail (I assume so)?

On May 8, 2023, at 7:20 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

Hi all,

Following the recent discussion threads, I would like to propose CEP-30 to add Approximate Nearest Neighbor (ANN) Vector Search via Storage-Attached Indexes (SAI) to Apache Cassandra.

The primary goal of this proposal is to implement ANN vector search capabilities, making Cassandra more useful to AI developers and organizations managing large datasets that can benefit from fast similarity search.

The implementation will leverage Lucene's Hierarchical Navigable Small World (HNSW) library and introduce a new CQL data type for vector embeddings, a new SAI index for ANN search functionality, and a new CQL operator for performing ANN search queries.

We are targeting the 5.0 release for this feature, in conjunction with the release of SAI. The proposed changes will maintain compatibility with existing Cassandra functionality and compose well with the already-approved SAI features.

Please find the full CEP document here: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes

--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Reply via email to