[ 
https://issues.apache.org/jira/browse/CASSANDRA-18443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-18443:
-------------------------------------
          Fix Version/s: 4.1.2
                         4.0.10
                             (was: 4.0.x)
                             (was: 4.1.x)
    Source Control Link: 
https://github.com/apache/cassandra/commit/cd9bed0aeadd94136a8a6c6ed284cc4684b0666c
             Resolution: Fixed
                 Status: Resolved  (was: Ready to Commit)

> Deadlock updating sstable metadata if disk boundaries need reloading
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-18443
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18443
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction, Local/Memtable, Local/SSTable
>            Reporter: Jon Meredith
>            Assignee: Jon Meredith
>            Priority: Normal
>             Fix For: 5.0, 4.1.2, 4.0.10
>
>
> {{CompactionStrategyManager.handleNotification}} holds the read lock while 
> processing notifications. When handling metadata changed notifications, an 
> extra call is made to maybeReloadDiskBoundaries which tries to grab the write 
> lock and deadlocks the thread.
> Partial stacktrace
> {code}
>         at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
>         - parking to wait for  <0x00000005cc000078> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire
>         at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock
>         at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.maybeReloadDiskBoundaries(CompactionStrategyManager.java:495)
>         at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.getCompactionStrategyFor(CompactionStrategyManager.java:343)
>         at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleMetadataChangedNotification(CompactionStrategyManager.java:796)
>         at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:838)
>         at 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTableMetadataChanged(Tracker.java:482)
>         at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:838)
> {code}
> Deadlocking with the read lock held blocks the SlabpoolCleaner while 
> notifying ColumnFamilyStore so memtables are prevented from being flushed and 
> recycled, causing any thread applying a mutation to the database (at least 
> GossipStage and MutationStage) to be considered down by peers and/or back up 
> with pending requests.
> All the cases investigated were during single sstable upleveling by 
> {{org.apache.cassandra.db.compaction.SingleSSTableLCSTask}} added in 
> CASSANDRA-12526.
> Other less critical work was also affected, JMX calls to get estimated 
> remaining compaction tasks, the index summary manager redistributing 
> summaries, the StatusLogger trying to log dropped messages, and the 
> ValidationManager.
> Workaround is to reboot the affected host.
> The fix is to just remove the redundant disk boundary reload check on that 
> path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to