[
https://issues.apache.org/jira/browse/CASSANDRA-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haoze Wu updated CASSANDRA-18748:
---------------------------------
Status: Open (was: Triage Needed)
> Transient disk failure could incur snapshot repair block forever
> ----------------------------------------------------------------
>
> Key: CASSANDRA-18748
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18748
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Repair
> Reporter: Haoze Wu
> Priority: Normal
>
> We were doing some systematic testing in Cassandra stable release 2.0.3 and
> we found a transient disk failure(a FileNotFoundException) during flushing in
> column family creation could lead to the situation that afterwards snapshot
> repair of that CF blocks forever.
> In the workload, we start a cluster of 3 nodes. Then we start a client to the
> first node to create the table. Afterwards, we start 2 clients each node to
> do read and write.
> The create request to the first node will then be synced to the second node
> to create the column family, keyspace, and the table at the second node.
> However, because of transient disk failure, there could be
> FileNotFoundException thrown in
> in MmappedSegmentedFile#createSegments(String path) during this process:
> {code:java}
> try
> {
> raf = new RandomAccessFile(path, "r"); // Exception here!!!
> }
> catch (FileNotFoundException e)
> {
> throw new RuntimeException(e);
> } {code}
> In the second node's log:
> {code:java}
> 2023-08-12 16:09:08,927 - ERROR [FlushWriter:1:CassandraDaemon$2@187] -
> Exception in thread Thread[FlushWriter:1,5,main]
> java.lang.RuntimeException: java.io.FileNotFoundException:
> at
> org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:183)
> at
> org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:316)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:306)
> at
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:372)
> at
> org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:320)
> at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: Inject Error!
> at
> org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:178)
> ... 10 more {code}
> Then the afterwards version request from other nodes would fail because the
> column family would fail to be created:
> {code:java}
> 2023-08-12 16:09:34,931 - WARN [Thread-12:IncomingTcpConnection@83] -
> UnknownColumnFamilyException reading from socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=7f032c76-8e13-34ff-8d56-24fa66dcb6ff
> at
> org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:178)
> at
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:103)
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserializeOneCf(RowMutation.java:304)
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:284)
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:312)
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:254)
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153)
> at
> org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
> {code}
> Also, the afterwards read/write request would fail because keyspace not exist:
> {code:java}
> 2023-08-12 16:09:49,333 - ERROR [ReadStage:11:CassandraDaemon$2@187] -
> Exception in thread Thread[ReadStage:11,5,main]
> java.lang.AssertionError: Unknown keyspace gray_space
> at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:262)
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
> at
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:43)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) {code}
> Also, if we initiate a repair request at node1, we will encounter the
> situation in Cassandra-6415, which is that node1 will block in makeSnapshots
> forever:
> {code:java}
> "AntiEntropySessions:1" #458 daemon prio=5 os_prio=0 tid=0x00007fa954032800
> nid=0x64c4 waiting on condition [0x00007fa6d218e000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x000000062464e1a0> (a
> java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> at
> org.apache.cassandra.repair.RepairJob.makeSnapshots(RepairJob.java:140)
> at
> org.apache.cassandra.repair.RepairJob.sendTreeRequests(RepairJob.java:109)
> at
> org.apache.cassandra.repair.RepairSession.runMayThrow(RepairSession.java:267)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) {code}
> Although in Cassandra-6455, we propose a longer term solution for the block.
> However, it
> only targets for the symptom. As to the root cause in this case, it looks
> that at node1 and node3, the node2 is considered as a valid companion even
> the table, keyspace, and columnfamily does not exist on node2. This leads to
> a persistent inconsistency.
> As to the root cause, one potential fix is to recovery from this by
> considering recreating the columnfamily after the failure. Or maybe other
> nodes would not consider the failed node as a valid companion for this
> columnfamily and do not consider it in afterwards activity.
> We propose a valid way to reproduce it in this repo:
> https://github.com/tonyPan123/cassandra-18748
> Any comments and suggestions would be appreciated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]