[ 
https://issues.apache.org/jira/browse/CASSANDRA-19661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18056972#comment-18056972
 ] 

Michael Marshall edited comment on CASSANDRA-19661 at 2/6/26 7:51 PM:
----------------------------------------------------------------------

Here is a draft solution: https://github.com/apache/cassandra/pull/4605. It 
works by modifying UCS, while leaving the underlying vector design untouched.

In general, I see one key requirement: do not attempt to build a graph at flush 
time. It is expensive to build vector graphs, and we do not want to build up an 
excessive number of vectors in memory due to delays at flush time.

Therefore, I see two options:

1. Pre-shard the vector memtable index to align with flush shard bounds
2. Prevent sharding at flush time. (This is the proposed solution in the draft 
PR. I haven't explicitly tested it yet because I want to get general feedback 
on the design before proceeding)

>From a vector perf perspective, we always want to build bigger graphs. The 
>time complexity of graph search is `log( n )`. Many small graphs means we have 
>`k * log( n )` performance and that significantly increases latency/reduces 
>throughput. Given the benefit to read performance, I lean towards avoiding 
>eager sharding.

Note also that the `TrieMemoryIndex` differs from the `VectorMemoryIndex` in 
one significant way: one has a `synchronized` add method while the other does 
not. (Actually, I just noticed that the `VectorMemoryIndex` has the 
`synchronized` keyword, but doesn't need it! I just created this ticket as a 
follow up https://issues.apache.org/jira/browse/CASSANDRA-21160) I mention this 
because the primary benefit for pre-sharding the memtable index is to increase 
write throughput so that writes to different partitions can proceed 
independently. The jvector library allows for this, but the `TrieMemoryIndex` 
does not.


was (Author: mmarshall):
Here is a draft solution: https://github.com/apache/cassandra/pull/4605. It 
works by modifying UCS, while leaving the underlying vector design untouched.

In general, I see one key requirement: do not attempt to build a graph at flush 
time. It is expensive to build vector graphs, and we do not want to build up an 
excessive number of vectors in memory due to delays at flush time.

Therefore, I see two options:

1. Pre-shard the vector memtable index to align with flush shard bounds
2. Prevent sharding at flush time. (This is the proposed solution in the draft 
PR. I haven't explicitly tested it yet because I want to get general feedback 
on the design before proceeding)

>From a vector perf perspective, we always want to build bigger graphs. The 
>time complexity of graph search is `log(n)`. Many small graphs means we have 
>`k * log(n)` performance and that significantly increases latency/reduces 
>throughput. Given the benefit to read performance, I lean towards avoiding 
>eager sharding.

Note also that the `TrieMemoryIndex` differs from the `VectorMemoryIndex` in 
one significant way: one has a `synchronized` add method while the other does 
not. (Actually, I just noticed that the `VectorMemoryIndex` has the 
`synchronized` keyword, but doesn't need it! I just created this ticket as a 
follow up https://issues.apache.org/jira/browse/CASSANDRA-21160) I mention this 
because the primary benefit for pre-sharding the memtable index is to increase 
write throughput so that writes to different partitions can proceed 
independently. The jvector library allows for this, but the `TrieMemoryIndex` 
does not.

> Cannot restart Cassandra 5 after creating a vector table and index
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-19661
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19661
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Feature/SAI, Feature/Vector Search, Local/Startup and 
> Shutdown
>            Reporter: Sergio Rua
>            Priority: Normal
>             Fix For: 5.0.x, 6.x
>
>         Attachments: 10.103.220.89_thread_dump.tgz, 
> 5.0.2_fail_memtableflush_vector_full.txt, logs.tar.gz, screenshot-1.png, 
> upload_content.py
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I'm using llama-index and llama3 to train a model. I'm using a very simple 
> code that reads some *.txt files from local and uploads them to Cassandra and 
> then creates the index:
>  
> {code:java}
> # Create the index from documents
> index = VectorStoreIndex.from_documents(
>     documents,
>     service_context=vector_store.service_context,
>     storage_context=storage_context,
>     show_progress=True,
>     ) {code}
> This works well and I'm able to use a Chat app to get responses from the 
> Cassandra data. however, right after, I cannot restart Cassandra. It'll break 
> with the following error:
>  
> {code:java}
> INFO  [PerDiskMemtableFlushWriter_0:7] 2024-05-23 08:23:20,102 
> Flushing.java:179 - Completed flushing 
> /data/cassandra/data/gpt/docs_20240523-10c8eaa018d811ef8dadf75182f3e2b4/da-6-bti-Data.db
>  (124.236MiB) for commitlog position 
> CommitLogPosition(segmentId=1716452305636, position=15336)
> [...]
> WARN  [MemtableFlushWriter:1] 2024-05-23 08:28:29,575 
> MemtableIndexWriter.java:92 - [gpt.docs.idx_vector_docs] Aborting index 
> memtable flush for 
> /data/cassandra/data/gpt/docs-aea77a80184b11ef8dadf75182f3e2b4/da-3-bti...{code}
> {code:java}
> java.lang.IllegalStateException: null
>         at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:496)
>         at 
> org.apache.cassandra.index.sai.disk.v1.vector.VectorPostings.computeRowIds(VectorPostings.java:76)
>         at 
> org.apache.cassandra.index.sai.disk.v1.vector.OnHeapGraph.writeData(OnHeapGraph.java:313)
>         at 
> org.apache.cassandra.index.sai.memory.VectorMemoryIndex.writeDirect(VectorMemoryIndex.java:272)
>         at 
> org.apache.cassandra.index.sai.memory.MemtableIndex.writeDirect(MemtableIndex.java:110)
>         at 
> org.apache.cassandra.index.sai.disk.v1.MemtableIndexWriter.flushVectorIndex(MemtableIndexWriter.java:192)
>         at 
> org.apache.cassandra.index.sai.disk.v1.MemtableIndexWriter.complete(MemtableIndexWriter.java:117)
>         at 
> org.apache.cassandra.index.sai.disk.StorageAttachedIndexWriter.complete(StorageAttachedIndexWriter.java:185)
>         at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>         at 
> java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085)
>         at 
> org.apache.cassandra.io.sstable.format.SSTableWriter.commit(SSTableWriter.java:289)
>         at 
> org.apache.cassandra.db.compaction.unified.ShardedMultiWriter.commit(ShardedMultiWriter.java:219)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1323)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1222)
>         at 
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829) {code}
> The table created by the script is as follows:
>  
> {noformat}
> CREATE TABLE gpt.docs (
>     partition_id text,
>     row_id text,
>     attributes_blob text,
>     body_blob text,
>     vector vector<float, 1024>,
>     metadata_s map<text, text>,
>     PRIMARY KEY (partition_id, row_id)
> ) WITH CLUSTERING ORDER BY (row_id ASC)
>     AND additional_write_policy = '99p'
>     AND allow_auto_snapshot = true
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND cdc = false
>     AND comment = ''
>     AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy', 
> 'scaling_parameters': 'T4', 'target_sstable_size': '1GiB'}
>     AND compression = {'chunk_length_in_kb': '16', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND memtable = 'default'
>     AND crc_check_chance = 1.0
>     AND default_time_to_live = 0
>     AND extensions = {}
>     AND gc_grace_seconds = 864000
>     AND incremental_backups = true
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair = 'BLOCKING'
>     AND speculative_retry = '99p';
> CREATE CUSTOM INDEX eidx_metadata_s_docs ON gpt.docs (entries(metadata_s)) 
> USING 'org.apache.cassandra.index.sai.StorageAttachedIndex';
> CREATE CUSTOM INDEX idx_vector_docs ON gpt.docs (vector) USING 
> 'org.apache.cassandra.index.sai.StorageAttachedIndex';{noformat}
> Thank you
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to