Wei Deng created CASSANDRA-16047:
------------------------------------

             Summary: Potential race condition in creating hard link when 
incremental backup is turned on
                 Key: CASSANDRA-16047
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16047
             Project: Cassandra
          Issue Type: Bug
          Components: Local/SSTable
            Reporter: Wei Deng
         Attachments: incremental_backup_hardlink_exception.jpg, 
incremental_backup_hardlink_exception1.jpg

It seems that there is a race condition in creating hard link if incremental 
backup is turned on.

The following screenshot was captured in a production cluster running Cassandra 
3.0.15 after turning on incremental backup. When this {{NoSuchFileException}} 
happens, due to the {{FSWriteError}} and the default disk failure policy, the 
JVM will be shutdown, so it's a pretty critical bug.
 !incremental_backup_hardlink_exception.jpg! 

 Due to the risk of causing production database downtime (if similar issue 
happens on multiple nodes in a short time frame), incremental backup had to be 
turned off for now, but this is not an ideal situation.

!incremental_backup_hardlink_exception1.jpg!

The deployment is on a public cloud environment with EBS-like disks that are 
backed by SSD with decent latency, throughput and IOPS, so it is hard to think 
the culprit being in the OS and IO layer. Based on the second screenshot above, 
this is a low flush traffic {{system.size_estimates}} table, so compaction of 
the source SSTable doesn't seem to be at play here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to