[
https://issues.apache.org/jira/browse/CASSANDRA-18858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ke Han updated CASSANDRA-18858:
-------------------------------
Description:
After dropping the keyspace and then perform the drain, I met the following
ERROR message
{code:java}
rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]]
INFO [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 -
Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh
INFO [Native-Transport-Requests-6] 2023-09-09 02:06:40,549
MigrationManager.java:362 - Drop Keyspace 'uuid00bd53ebd0c84be39baf96c086b9f8d0'
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353
StorageService.java:1679 - DRAINING: starting drain process
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358
HintsService.java:210 - Paused hints dispatch
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364 Server.java:179
- Stop listening for CQL clients
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365
Gossiper.java:1747 - Announcing shutdown
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369
StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373
MessagingService.java:985 - Waiting for messaging service to quiesce
INFO [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375 MessagingService.java:1346
- MessagingService has terminated the accept() thread
ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384
StorageService.java:4889 - Caught an exception while draining
java.lang.AssertionError: Unknown keyspace uuid00bd53ebd0c84be39baf96c086b9f8d0
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
at
com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at
org.apache.cassandra.service.StorageService.drain(StorageService.java:4824)
at
org.apache.cassandra.service.StorageService.drain(StorageService.java:4749)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
at
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
INFO [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953
CassandraDaemon.java:576 - Cassandra shutting down...
INFO [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 -
Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml
INFO [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node
configuration:[allocate_tokens_for_keyspace=null; {code}
This is a concurrency problem related to the metadata update.
There are two threads interleaving:
* Thread1 (meta-data update related thread): update the keyspace metadata
* Thread2 (drain): flush all in-memory data to disk.
Here's how it happened in detail
Step1: The user issues a drop keyspace command. However, the update of the
metadata gets delayed.
Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing
operation
{code:java}
protected synchronized void drain(boolean isFinalShutdown) throws IOException,
InterruptedException, ExecutionException
{
for (Keyspace keyspace : Keyspace.nonSystem())
{
for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores())
flushes.add(cfs.forceFlush());
}{code}
keyspace.getColumnFamilyStores() returns a copy of the keyspaces
{code:java}
// Schema.java
public List<String> getNonSystemKeyspaces()
{
return ImmutableList.copyOf(getNonSystemKeyspacesSet());
}
private Set<String> getNonSystemKeyspacesSet()
{
return Sets.difference(keyspaces.keySet(),
SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES);
}{code}
Step3: Thread1 (Metadata update related threads) updates the metadata and
removes the keyspace
Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs into
exceptions since the keyspace has been dropped.
The symptom is similar to CASSANDRA-18636, but the required triggering thread
interleaving is different => different stack traces. For CASSANDRA-18636, it's
the interleaving between _thread updating metadata_ and {_}thread estimating
the size{_}. For this case, it's interleaving between the _thread updating the
metadata_ and _thread performing the drain._
They might be fixed together with the
[patch|https://github.com/apache/cassandra/pull/2457] by [Stefan
Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic]
if exhausting all inferences to the old keyspace copies. (The previous patch
only targets at one interleaving situation, this ticket shows that there could
be other cases).
I have attached the system.log file.
was:
After dropping the keyspace and then perform the drain, I met the following
ERROR message
{code:java}
rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]]
INFO [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 -
Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh
INFO [Native-Transport-Requests-6] 2023-09-09 02:06:40,549
MigrationManager.java:362 - Drop Keyspace 'uuid00bd53ebd0c84be39baf96c086b9f8d0'
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353
StorageService.java:1679 - DRAINING: starting drain process
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358
HintsService.java:210 - Paused hints dispatch
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364 Server.java:179
- Stop listening for CQL clients
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365
Gossiper.java:1747 - Announcing shutdown
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369
StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373
MessagingService.java:985 - Waiting for messaging service to quiesce
INFO [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375 MessagingService.java:1346
- MessagingService has terminated the accept() thread
ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384
StorageService.java:4889 - Caught an exception while draining
java.lang.AssertionError: Unknown keyspace uuid00bd53ebd0c84be39baf96c086b9f8d0
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
at
com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at
org.apache.cassandra.service.StorageService.drain(StorageService.java:4824)
at
org.apache.cassandra.service.StorageService.drain(StorageService.java:4749)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
at
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
INFO [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953
CassandraDaemon.java:576 - Cassandra shutting down...
INFO [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 -
Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml
INFO [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node
configuration:[allocate_tokens_for_keyspace=null; {code}
This is a concurrency problem related to the metadata update.
There are two threads interleaving:
* Thread1 (meta-data update related thread): update the keyspace metadata
* Thread2 (drain): flush all in-memory data to disk.
Here's how it happened in detail
Step1: The user issues a drop keyspace command. However, the update of the
metadata gets delayed.
Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing
operation
{code:java}
protected synchronized void drain(boolean isFinalShutdown) throws IOException,
InterruptedException, ExecutionException
{
for (Keyspace keyspace : Keyspace.nonSystem())
{
for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores())
flushes.add(cfs.forceFlush());
}{code}
keyspace.getColumnFamilyStores() returns a copy of the keyspaces
{code:java}
// Schema.java
public List<String> getNonSystemKeyspaces()
{
return ImmutableList.copyOf(getNonSystemKeyspacesSet());
}
private Set<String> getNonSystemKeyspacesSet()
{
return Sets.difference(keyspaces.keySet(),
SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES);
}{code}
Step3: Thread1 (Metadata update related threads) updates the metadata and
removes the keyspace
Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs into
exceptions since the keyspace has been dropped.
The symptom is similar to CASSANDRA-18636, but the required triggering thread
interleaving is different => different stack traces. For CASSANDRA-18636, it's
the interleaving between _thread updating metadata_ and {_}thread estimating
the size{_}. For this case, it's interleaving between the _thread updating the
metadata_ and _thread performing the drain._
They might be fixed together with the
[patch|https://github.com/apache/cassandra/pull/2457] by [Stefan
Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic]
if exhaust all inferences to the old keyspace copies. (The previous patch only
targets at one interleaving situation, this ticked shows that there could be
other cases).
I have attached the system.log file.
> AssertionError: Unknown keyspace after dropping a keyspace combined with
> drain operation
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-18858
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18858
> Project: Cassandra
> Issue Type: Bug
> Reporter: Ke Han
> Priority: Normal
> Attachments: system.log
>
>
> After dropping the keyspace and then perform the drain, I met the following
> ERROR message
> {code:java}
> rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]]
> INFO [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 -
> Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh
> INFO [Native-Transport-Requests-6] 2023-09-09 02:06:40,549
> MigrationManager.java:362 - Drop Keyspace
> 'uuid00bd53ebd0c84be39baf96c086b9f8d0'
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353
> StorageService.java:1679 - DRAINING: starting drain process
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358
> HintsService.java:210 - Paused hints dispatch
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364
> Server.java:179 - Stop listening for CQL clients
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365
> Gossiper.java:1747 - Announcing shutdown
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369
> StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373
> MessagingService.java:985 - Waiting for messaging service to quiesce
> INFO [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375
> MessagingService.java:1346 - MessagingService has terminated the accept()
> thread
> ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384
> StorageService.java:4889 - Caught an exception while draining
> java.lang.AssertionError: Unknown keyspace
> uuid00bd53ebd0c84be39baf96c086b9f8d0
> at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
> at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
> at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
> at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
> at
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
> at
> org.apache.cassandra.service.StorageService.drain(StorageService.java:4824)
> at
> org.apache.cassandra.service.StorageService.drain(StorageService.java:4749)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
> at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
> at
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
> at
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
> at
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
> at sun.rmi.transport.Transport$1.run(Transport.java:200)
> at sun.rmi.transport.Transport$1.run(Transport.java:197)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
> at
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> INFO [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953
> CassandraDaemon.java:576 - Cassandra shutting down...
> INFO [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 -
> Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml
> INFO [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node
> configuration:[allocate_tokens_for_keyspace=null; {code}
> This is a concurrency problem related to the metadata update.
> There are two threads interleaving:
> * Thread1 (meta-data update related thread): update the keyspace metadata
> * Thread2 (drain): flush all in-memory data to disk.
> Here's how it happened in detail
> Step1: The user issues a drop keyspace command. However, the update of the
> metadata gets delayed.
> Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing
> operation
> {code:java}
> protected synchronized void drain(boolean isFinalShutdown) throws
> IOException, InterruptedException, ExecutionException
> {
> for (Keyspace keyspace : Keyspace.nonSystem())
> {
> for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores())
> flushes.add(cfs.forceFlush());
> }{code}
> keyspace.getColumnFamilyStores() returns a copy of the keyspaces
> {code:java}
> // Schema.java
> public List<String> getNonSystemKeyspaces()
> {
> return ImmutableList.copyOf(getNonSystemKeyspacesSet());
> }
> private Set<String> getNonSystemKeyspacesSet()
> {
> return Sets.difference(keyspaces.keySet(),
> SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES);
> }{code}
> Step3: Thread1 (Metadata update related threads) updates the metadata and
> removes the keyspace
> Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs
> into exceptions since the keyspace has been dropped.
>
> The symptom is similar to CASSANDRA-18636, but the required triggering thread
> interleaving is different => different stack traces. For CASSANDRA-18636,
> it's the interleaving between _thread updating metadata_ and {_}thread
> estimating the size{_}. For this case, it's interleaving between the _thread
> updating the metadata_ and _thread performing the drain._
> They might be fixed together with the
> [patch|https://github.com/apache/cassandra/pull/2457] by [Stefan
> Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic]
> if exhausting all inferences to the old keyspace copies. (The previous patch
> only targets at one interleaving situation, this ticket shows that there
> could be other cases).
> I have attached the system.log file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]