[
https://issues.apache.org/jira/browse/CASSANDRA-18858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brandon Williams updated CASSANDRA-18858:
-----------------------------------------
Resolution: Not A Problem
Status: Resolved (was: Triage Needed)
bq. But this error message is still thrown in the log. It would be better if
this error message is handled invisibly to users or maybe degraded to a warning
level so that users don't get confused about whether the system drain operation
executes successfully
The user caused an error so having one logged makes sense. The error is in the
JMX thread so they received the exception on the other side and there should be
no need to check the log nor any confusion if they do.
> AssertionError: Unknown keyspace after dropping a keyspace combined with
> drain operation
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-18858
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18858
> Project: Cassandra
> Issue Type: Bug
> Reporter: Ke Han
> Priority: Normal
> Attachments: system.log
>
>
> After dropping the keyspace and then perform the drain, I met the following
> ERROR message
> {code:java}
> rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]]
> INFO [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 -
> Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh
> INFO [Native-Transport-Requests-6] 2023-09-09 02:06:40,549
> MigrationManager.java:362 - Drop Keyspace
> 'uuid00bd53ebd0c84be39baf96c086b9f8d0'
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353
> StorageService.java:1679 - DRAINING: starting drain process
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358
> HintsService.java:210 - Paused hints dispatch
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364
> Server.java:179 - Stop listening for CQL clients
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365
> Gossiper.java:1747 - Announcing shutdown
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369
> StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373
> MessagingService.java:985 - Waiting for messaging service to quiesce
> INFO [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375
> MessagingService.java:1346 - MessagingService has terminated the accept()
> thread
> ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384
> StorageService.java:4889 - Caught an exception while draining
> java.lang.AssertionError: Unknown keyspace
> uuid00bd53ebd0c84be39baf96c086b9f8d0
> at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
> at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
> at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
> at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
> at
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
> at
> org.apache.cassandra.service.StorageService.drain(StorageService.java:4824)
> at
> org.apache.cassandra.service.StorageService.drain(StorageService.java:4749)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
> at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
> at
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
> at
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
> at
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
> at sun.rmi.transport.Transport$1.run(Transport.java:200)
> at sun.rmi.transport.Transport$1.run(Transport.java:197)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
> at
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> INFO [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953
> CassandraDaemon.java:576 - Cassandra shutting down...
> INFO [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 -
> Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml
> INFO [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node
> configuration:[allocate_tokens_for_keyspace=null; {code}
> This is a concurrency problem related to the metadata update.
> There are two threads interleaving:
> * Thread1 (meta-data update related thread): update the keyspace metadata
> * Thread2 (drain): flush all in-memory data to disk.
> Here's how it happened in detail
> Step1: The user issues a drop keyspace command. However, the update of the
> metadata gets delayed.
> Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing
> operation
> {code:java}
> protected synchronized void drain(boolean isFinalShutdown) throws
> IOException, InterruptedException, ExecutionException
> {
> for (Keyspace keyspace : Keyspace.nonSystem())
> {
> for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores())
> flushes.add(cfs.forceFlush());
> }{code}
> keyspace.getColumnFamilyStores() returns a copy of the keyspaces
> {code:java}
> // Schema.java
> public List<String> getNonSystemKeyspaces()
> {
> return ImmutableList.copyOf(getNonSystemKeyspacesSet());
> }
> private Set<String> getNonSystemKeyspacesSet()
> {
> return Sets.difference(keyspaces.keySet(),
> SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES);
> }{code}
> Step3: Thread1 (Metadata update related threads) updates the metadata and
> removes the keyspace
> Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs
> into exceptions since the keyspace has been dropped.
>
> The symptom is similar to CASSANDRA-18636, but the required triggering thread
> interleaving is different => different stack traces. For CASSANDRA-18636,
> it's the interleaving between _thread updating metadata_ and {_}thread
> estimating the size{_}. For this case, it's interleaving between the _thread
> updating the metadata_ and _thread performing the drain._
> They might be fixed together with the
> [patch|https://github.com/apache/cassandra/pull/2457] by [Stefan
> Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic]
> if exhausting all inferences to the old keyspace copies. (The previous patch
> only targets at one interleaving situation, this ticket shows that there
> could be other cases).
> I have attached the system.log file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]