[ 
https://issues.apache.org/jira/browse/CASSANDRA-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haoze Wu updated CASSANDRA-18748:
---------------------------------
    Description: 
We were doing some systematic testing in Cassandra stable release 2.0.3 and we 
found a transient disk failure(a FileNotFoundException) during flushing in 
column family creation could lead to the situation that afterwards snapshot 
repair of that CF blocks forever. 

In the workload, we start a cluster of 3 nodes. Then we start a client to the 
first node to create the table. Afterwards, we start 2 clients each node to do 
read and write. 

The create request to the first node will then be synced to the second node to 
create the column family, keyspace, and the table at the second node. 

However, because of transient disk failure, there could be 
FileNotFoundException thrown in 

in MmappedSegmentedFile#createSegments(String path) during this process: 
{code:java}
            try
            {
                raf = new RandomAccessFile(path, "r"); // Exception here!!!
            }
            catch (FileNotFoundException e)
            {
                throw new RuntimeException(e);
            } {code}
In the second node's log: 
{code:java}
2023-08-12 16:09:08,927 - ERROR [FlushWriter:1:CassandraDaemon$2@187] - 
Exception in thread Thread[FlushWriter:1,5,main]
java.lang.RuntimeException: java.io.FileNotFoundException: 
        at 
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:183)
        at 
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168)
        at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:316)
        at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:306)
        at 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:372)
        at 
org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:320)
        at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: Inject Error!
        at 
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:178)
        ... 10 more {code}
Then the afterwards version request from other nodes would fail because the 
column family would fail to be created:
{code:java}
2023-08-12 16:09:34,931 - WARN  [Thread-12:IncomingTcpConnection@83] - 
UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=7f032c76-8e13-34ff-8d56-24fa66dcb6ff
        at 
org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:178)
        at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:103)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserializeOneCf(RowMutation.java:304)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:284)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:312)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:254)
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
        at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153)
        at 
org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130)
        at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
 {code}
Also, the afterwards read/write request would fail because keyspace not exist:
{code:java}
2023-08-12 16:09:49,333 - ERROR [ReadStage:11:CassandraDaemon$2@187] - 
Exception in thread Thread[ReadStage:11,5,main]
java.lang.AssertionError: Unknown keyspace gray_space
        at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:262)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
        at 
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:43)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) {code}
Also, if we initiate a repair request at node1, we will encounter the situation 
in [Cassandra-6415|https://issues.apache.org/jira/browse/CASSANDRA-6415], which 
is that node1 will block in makeSnapshots forever: 
{code:java}
"AntiEntropySessions:1" #458 daemon prio=5 os_prio=0 tid=0x00007fa954032800 
nid=0x64c4 waiting on condition [0x00007fa6d218e000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000062464e1a0> (a 
java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at 
org.apache.cassandra.repair.RepairJob.makeSnapshots(RepairJob.java:140)
        at 
org.apache.cassandra.repair.RepairJob.sendTreeRequests(RepairJob.java:109)
        at 
org.apache.cassandra.repair.RepairSession.runMayThrow(RepairSession.java:267)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) {code}
Although in Cassandra-6455, we propose a longer term solution for the block. 
However, it

only targets for the symptom.  As to the root cause in this case, it looks that 
at node1 and node3, the node2 is considered as a valid companion even the 
table, keyspace, and columnfamily does not exist on node2. This leads to an 
inconsistency. 

As to the root cause, one potential fix is to tolerate that by considering 
recreating the columnfamily after the failure. Or maybe other nodes would not 
consider the failed node as a valid companion for this columnfamily.

Any comments and suggestions would be appreciated.

  was:
We were doing some systematic testing in Cassandra stable release 2.0.3 and we 
found a transient disk failure(a FileNotFoundException) during flushing in 
column family creation could lead to the situation that afterwards snapshot 
repair of that CF blocks forever. 

In the workload, we start a cluster of 3 nodes. Then we start a client to the 
first node to create the table. Afterwards, we start 2 clients each node to do 
read and write. 

The create request to the first node will then be synced to the second node to 
create the column family, keyspace, and the table at the second node. 

However, because of transient disk failure, there could be 
FileNotFoundException thrown in 

in MmappedSegmentedFile#createSegments(String path) during this process: 

 
{code:java}
            try
            {
                raf = new RandomAccessFile(path, "r"); // Exception here!!!
            }
            catch (FileNotFoundException e)
            {
                throw new RuntimeException(e);
            } {code}
In the second node's log: 

 
{code:java}
2023-08-12 16:09:08,927 - ERROR [FlushWriter:1:CassandraDaemon$2@187] - 
Exception in thread Thread[FlushWriter:1,5,main]
java.lang.RuntimeException: java.io.FileNotFoundException: 
        at 
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:183)
        at 
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168)
        at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:316)
        at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:306)
        at 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:372)
        at 
org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:320)
        at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: Inject Error!
        at 
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:178)
        ... 10 more {code}
Then the afterwards version request from other nodes would fail because the 
column family would fail to be created:
{code:java}
2023-08-12 16:09:34,931 - WARN  [Thread-12:IncomingTcpConnection@83] - 
UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=7f032c76-8e13-34ff-8d56-24fa66dcb6ff
        at 
org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:178)
        at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:103)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserializeOneCf(RowMutation.java:304)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:284)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:312)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:254)
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
        at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153)
        at 
org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130)
        at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
 {code}
Also, the afterwards read/write request would fail because keyspace not exit:
{code:java}
2023-08-12 16:09:49,333 - ERROR [ReadStage:11:CassandraDaemon$2@187] - 
Exception in thread Thread[ReadStage:11,5,main]
java.lang.AssertionError: Unknown keyspace gray_space
        at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:262)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
        at 
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:43)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) {code}
Also, if we initiate a repair request at node1, we will encounter the situation 
in Cassandra-6415, which is that node1 will block in makeSnapshots forever: 
{code:java}
"AntiEntropySessions:1" #458 daemon prio=5 os_prio=0 tid=0x00007fa954032800 
nid=0x64c4 waiting on condition [0x00007fa6d218e000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000062464e1a0> (a 
java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at 
org.apache.cassandra.repair.RepairJob.makeSnapshots(RepairJob.java:140)
        at 
org.apache.cassandra.repair.RepairJob.sendTreeRequests(RepairJob.java:109)
        at 
org.apache.cassandra.repair.RepairSession.runMayThrow(RepairSession.java:267)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) {code}
Although in Cassandra-6455, we propose a longer term solution for the block. 
However, it

only targets for the symptom.  As to the root cause in this case, it looks that 
at node1 and node3, the node2 is considered as a valid companion even the 
table, keyspace, and columnfamily does not exist on node2. This leads to an 
inconsistency. 

As to the root cause, one potential fix is to tolerate that by considering 
recreating the columnfamily after the failure. Or maybe other nodes would not 
consider the failed node as a valid companion for this columnfamily.

Any comments and suggestions would be appreciated.


> Transient disk failure could incur snapshot repair block forever
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-18748
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18748
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Haoze Wu
>            Priority: Normal
>
> We were doing some systematic testing in Cassandra stable release 2.0.3 and 
> we found a transient disk failure(a FileNotFoundException) during flushing in 
> column family creation could lead to the situation that afterwards snapshot 
> repair of that CF blocks forever. 
> In the workload, we start a cluster of 3 nodes. Then we start a client to the 
> first node to create the table. Afterwards, we start 2 clients each node to 
> do read and write. 
> The create request to the first node will then be synced to the second node 
> to create the column family, keyspace, and the table at the second node. 
> However, because of transient disk failure, there could be 
> FileNotFoundException thrown in 
> in MmappedSegmentedFile#createSegments(String path) during this process: 
> {code:java}
>             try
>             {
>                 raf = new RandomAccessFile(path, "r"); // Exception here!!!
>             }
>             catch (FileNotFoundException e)
>             {
>                 throw new RuntimeException(e);
>             } {code}
> In the second node's log: 
> {code:java}
> 2023-08-12 16:09:08,927 - ERROR [FlushWriter:1:CassandraDaemon$2@187] - 
> Exception in thread Thread[FlushWriter:1,5,main]
> java.lang.RuntimeException: java.io.FileNotFoundException: 
>         at 
> org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:183)
>         at 
> org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168)
>         at 
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:316)
>         at 
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:306)
>         at 
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:372)
>         at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:320)
>         at 
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: Inject Error!
>         at 
> org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:178)
>         ... 10 more {code}
> Then the afterwards version request from other nodes would fail because the 
> column family would fail to be created:
> {code:java}
> 2023-08-12 16:09:34,931 - WARN  [Thread-12:IncomingTcpConnection@83] - 
> UnknownColumnFamilyException reading from socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
> cfId=7f032c76-8e13-34ff-8d56-24fa66dcb6ff
>         at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:178)
>         at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:103)
>         at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserializeOneCf(RowMutation.java:304)
>         at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:284)
>         at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:312)
>         at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:254)
>         at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
>         at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153)
>         at 
> org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130)
>         at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
>  {code}
> Also, the afterwards read/write request would fail because keyspace not exist:
> {code:java}
> 2023-08-12 16:09:49,333 - ERROR [ReadStage:11:CassandraDaemon$2@187] - 
> Exception in thread Thread[ReadStage:11,5,main]
> java.lang.AssertionError: Unknown keyspace gray_space
>         at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:262)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
>         at 
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:43)
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748) {code}
> Also, if we initiate a repair request at node1, we will encounter the 
> situation in 
> [Cassandra-6415|https://issues.apache.org/jira/browse/CASSANDRA-6415], which 
> is that node1 will block in makeSnapshots forever: 
> {code:java}
> "AntiEntropySessions:1" #458 daemon prio=5 os_prio=0 tid=0x00007fa954032800 
> nid=0x64c4 waiting on condition [0x00007fa6d218e000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x000000062464e1a0> (a 
> java.util.concurrent.CountDownLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>         at 
> org.apache.cassandra.repair.RepairJob.makeSnapshots(RepairJob.java:140)
>         at 
> org.apache.cassandra.repair.RepairJob.sendTreeRequests(RepairJob.java:109)
>         at 
> org.apache.cassandra.repair.RepairSession.runMayThrow(RepairSession.java:267)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748) {code}
> Although in Cassandra-6455, we propose a longer term solution for the block. 
> However, it
> only targets for the symptom.  As to the root cause in this case, it looks 
> that at node1 and node3, the node2 is considered as a valid companion even 
> the table, keyspace, and columnfamily does not exist on node2. This leads to 
> an inconsistency. 
> As to the root cause, one potential fix is to tolerate that by considering 
> recreating the columnfamily after the failure. Or maybe other nodes would not 
> consider the failed node as a valid companion for this columnfamily.
> Any comments and suggestions would be appreciated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to