Ke Han created CASSANDRA-18858:
----------------------------------
Summary: AssertionError: Unknown keyspace after dropping a
keyspace combined with drain operation
Key: CASSANDRA-18858
URL: https://issues.apache.org/jira/browse/CASSANDRA-18858
Project: Cassandra
Issue Type: Bug
Reporter: Ke Han
Attachments: system.log
After dropping the keyspace and then perform the drain, I met the following
ERROR message
{code:java}
rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]]
INFO [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 -
Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh
INFO [Native-Transport-Requests-6] 2023-09-09 02:06:40,549
MigrationManager.java:362 - Drop Keyspace 'uuid00bd53ebd0c84be39baf96c086b9f8d0'
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353
StorageService.java:1679 - DRAINING: starting drain process
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358
HintsService.java:210 - Paused hints dispatch
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364 Server.java:179
- Stop listening for CQL clients
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365
Gossiper.java:1747 - Announcing shutdown
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369
StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown
INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373
MessagingService.java:985 - Waiting for messaging service to quiesce
INFO [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375 MessagingService.java:1346
- MessagingService has terminated the accept() thread
ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384
StorageService.java:4889 - Caught an exception while draining
java.lang.AssertionError: Unknown keyspace uuid00bd53ebd0c84be39baf96c086b9f8d0
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
at
com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at
org.apache.cassandra.service.StorageService.drain(StorageService.java:4824)
at
org.apache.cassandra.service.StorageService.drain(StorageService.java:4749)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
at
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
INFO [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953
CassandraDaemon.java:576 - Cassandra shutting down...
INFO [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 -
Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml
INFO [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node
configuration:[allocate_tokens_for_keyspace=null; {code}
This is a concurrency problem related to the metadata update.
There are two threads interleaving:
# thread1 (meta-data update related thread): update the keyspace metadata
# thread2 (drain): flush all in-memory data to disk.
Here's how it happened in detail
Step1: The user issues a drop keyspace command. However, the update of the
metadata gets delayed.
Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing
operation
{code:java}
protected synchronized void drain(boolean isFinalShutdown) throws IOException,
InterruptedException, ExecutionException
{
for (Keyspace keyspace : Keyspace.nonSystem())
{
for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores())
flushes.add(cfs.forceFlush());
}{code}
keyspace.getColumnFamilyStores() returns a copy of the keyspaces
{code:java}
// Schema.java
public List<String> getNonSystemKeyspaces()
{
return ImmutableList.copyOf(getNonSystemKeyspacesSet());
}
private Set<String> getNonSystemKeyspacesSet()
{
return Sets.difference(keyspaces.keySet(),
SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES);
}{code}
Step3: Thread1 (Metadata update related threads) updates the metadata and
removes the keyspace
Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs into
exceptions since the keyspace has been dropped.
The symptoms is similar to CASSANDRA-18636, but the required triggering thread
interleaving is different. For CASSANDRA-18636, it's the interleaving between
_thread updating metadata_ and {_}thread estimating the size{_}. For this case,
it's interleaving between the _thread updating the metadata_ and _thread
performing the drain._
They might be fixed together with the
[patch|https://github.com/apache/cassandra/pull/2457] by [Stefan
Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic]
if exhaust all inferences to the old keyspace copies. (The previous patch only
targets at one interleaving situation, this ticked shows that there could be
other cases).
I have attached the system.log file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]