[
https://issues.apache.org/jira/browse/CASSANDRA-20141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024748#comment-18024748
]
Benedict Elliott Smith commented on CASSANDRA-20141:
----------------------------------------------------
Hi Robert, I suggest going to the dev list or slack to ask for help.
> Unresponsive node after ingesting large amounts of vectors
> ----------------------------------------------------------
>
> Key: CASSANDRA-20141
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20141
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Local/Memtable
> Reporter: Robert Knutsson
> Assignee: Robert Knutsson
> Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> {*}Background{*}:
> We have a Cassandra 5.0.2 cluster running on java 17, we've tried with
> everything from 3 to 23 nodes (running in AWS on r7i.4xlarge instances)
> We have a table with an id column of type TEXT and another column of type
> VECTOR <FLOAT, 256>.
> On that table we also have an SAI index on the VECTOR column with the options
> \{ 'similarity_function': 'EUCLIDEAN' }
> *When:*
> When we ingest large amounts of embeddings (~200 million) we notice each and
> every time that before all embeddings are saved a node becomes unresponsive
> (after >20 million are ingested) and no other node is unable to rejoin the
> cluster.
> If the index is removed before we ingest the data, everything is able to be
> properly persisted, but once the index is added (and created successfully)
> the same thing happens again once we continue writing more embeddings to the
> cluster
> *What:*
> We saw the following stacktrace in our logs:
> {noformat}
> java.lang.NullPointerException: Cannot invoke
> "java.lang.Boolean.booleanValue()" because "res" is null
> at
> org.apache.cassandra.utils.memory.MemtableCleanerThread$Clean.apply(MemtableCleanerThread.java:97)
> at
> org.apache.cassandra.utils.concurrent.ListenerList$CallbackBiConsumerListener.run(ListenerList.java:244)
> at
> org.apache.cassandra.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:140)
> at
> org.apache.cassandra.utils.concurrent.ListenerList.safeExecute(ListenerList.java:166)
> at
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:157)
> at
> org.apache.cassandra.utils.concurrent.ListenerList$CallbackBiConsumerListener.notifySelf(ListenerList.java:250)
> at
> org.apache.cassandra.utils.concurrent.ListenerList.lambda$notifyExclusive$0(ListenerList.java:124)
> at
> org.apache.cassandra.utils.concurrent.IntrusiveStack.forEach(IntrusiveStack.java:195)
> at
> org.apache.cassandra.utils.concurrent.ListenerList.notifyExclusive(ListenerList.java:124)
> at
> org.apache.cassandra.utils.concurrent.ListenerList.notify(ListenerList.java:96)
> at
> org.apache.cassandra.utils.concurrent.AsyncFuture.trySet(AsyncFuture.java:104)
> at
> org.apache.cassandra.utils.concurrent.AbstractFuture.tryFailure(AbstractFuture.java:148)
> at
> org.apache.cassandra.utils.concurrent.AsyncPromise.tryFailure(AsyncPromise.java:139)
> at
> org.apache.cassandra.db.memtable.AbstractAllocatorMemtable.lambda$flushLargestMemtable$0(AbstractAllocatorMemtable.java:306)
> at
> org.apache.cassandra.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:140)
> at
> org.apache.cassandra.utils.concurrent.ListenerList.safeExecute(ListenerList.java:166)
> at
> org.apache.cassandra.utils.concurrent.ListenerList.notifyListener(ListenerList.java:157)
> at
> org.apache.cassandra.utils.concurrent.ListenerList$RunnableWithExecutor.notifySelf(ListenerList.java:345)
> at
> org.apache.cassandra.utils.concurrent.ListenerList.lambda$notifyExclusive$0(ListenerList.java:124)
> at
> org.apache.cassandra.utils.concurrent.IntrusiveStack.forEach(IntrusiveStack.java:195)
> at
> org.apache.cassandra.utils.concurrent.ListenerList.notifyExclusive(ListenerList.java:124)
> at
> org.apache.cassandra.utils.concurrent.ListenerList.notify(ListenerList.java:96)
> at
> org.apache.cassandra.utils.concurrent.AsyncFuture.trySet(AsyncFuture.java:104)
> at
> org.apache.cassandra.utils.concurrent.AbstractFuture.tryFailure(AbstractFuture.java:148)
> at
> org.apache.cassandra.concurrent.FutureTask.tryFailure(FutureTask.java:87)
> at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:75)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:840)
> {noformat}
> This leads me to believe the above NPE happens once the Memtables are to be
> cleaned (persisted as SSTables?) perhaps?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]