[ 
https://issues.apache.org/jira/browse/CASSANDRA-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-13422:
---------------------------------------
    Description: 
{{getNextBackgroundTask}} in various compaction strategies (definitely {{LCS}}) 
rely on checking the result of {{DataTracker.getCompacting()}} to avoid 
accessing data and metadata related to tables that have already head their 
resources released.

There is a race where this check is unreliable and will claim a table that has 
its resources already released is not compacting resulting in use after free.

[{{LeveledCompactionStrategy.findDroppableSSTable}}|https://github.com/apache/cassandra/blob/c794d2bed7ca1d10e13c4da08a3d45f5c755c1d8/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java#L504]
 for instance has this three part logical && condition where the first check is 
against the compacting set before calling {{worthDroppingTombstones}} which 
fails if the table has been released.

The order of events is basically that CompactionStrategyManager acquires the 
read lock in getNextBackgroundTask(), then proceeds eventually to 
findDroppableSSTable and acquires a set of SSTables from the manifest. While 
the manifest is thread safe it's not accessed atomically WRT to other 
operations. Once it has acquire the set of tables it acquires the (not 
atomically) the set of compacting SSTables and iterates checking the former 
against the latter.

Meanwhile other compaction threads are marking tables obsolete or compacted and 
releasing their references. Doing this removes them from {{DataTracker}} and 
publishes a notification to the strategies, but this notification only requires 
the read lock. After the compaction thread has published the notifications it 
eventually marks the table as not compacting in {{DataTracker}} or removes it 
entirely.

The race is then that the compaction thread generating a new background task 
acquires the sstables from the manifest on the stack. Any table in that set 
that was compacting at that time must remain compacting so that it can be 
skipped. Another compaction thread finishes a compaction and is able to remove 
the table from the manifest and then remove it from the compacting set. The 
thread generating the background task then acquires the list of compacting 
tables which doesn't include the table it is supposed to skip.

The simple fix appears to be to require threads to acquire the write lock in 
order to publish notifications of tables being added/removed from compaction 
strategies. While holding the write lock it won't be possible for someone to 
see a view of tables in the manifest where tables that are compacting aren't 
compacting in the view

> CompactionStrategyManager should take write not read lock when handling 
> add/remove notifications
> ------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13422
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13422
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 4.0, 3.11.x
>
>
> {{getNextBackgroundTask}} in various compaction strategies (definitely 
> {{LCS}}) rely on checking the result of {{DataTracker.getCompacting()}} to 
> avoid accessing data and metadata related to tables that have already head 
> their resources released.
> There is a race where this check is unreliable and will claim a table that 
> has its resources already released is not compacting resulting in use after 
> free.
> [{{LeveledCompactionStrategy.findDroppableSSTable}}|https://github.com/apache/cassandra/blob/c794d2bed7ca1d10e13c4da08a3d45f5c755c1d8/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java#L504]
>  for instance has this three part logical && condition where the first check 
> is against the compacting set before calling {{worthDroppingTombstones}} 
> which fails if the table has been released.
> The order of events is basically that CompactionStrategyManager acquires the 
> read lock in getNextBackgroundTask(), then proceeds eventually to 
> findDroppableSSTable and acquires a set of SSTables from the manifest. While 
> the manifest is thread safe it's not accessed atomically WRT to other 
> operations. Once it has acquire the set of tables it acquires the (not 
> atomically) the set of compacting SSTables and iterates checking the former 
> against the latter.
> Meanwhile other compaction threads are marking tables obsolete or compacted 
> and releasing their references. Doing this removes them from {{DataTracker}} 
> and publishes a notification to the strategies, but this notification only 
> requires the read lock. After the compaction thread has published the 
> notifications it eventually marks the table as not compacting in 
> {{DataTracker}} or removes it entirely.
> The race is then that the compaction thread generating a new background task 
> acquires the sstables from the manifest on the stack. Any table in that set 
> that was compacting at that time must remain compacting so that it can be 
> skipped. Another compaction thread finishes a compaction and is able to 
> remove the table from the manifest and then remove it from the compacting 
> set. The thread generating the background task then acquires the list of 
> compacting tables which doesn't include the table it is supposed to skip.
> The simple fix appears to be to require threads to acquire the write lock in 
> order to publish notifications of tables being added/removed from compaction 
> strategies. While holding the write lock it won't be possible for someone to 
> see a view of tables in the manifest where tables that are compacting aren't 
> compacting in the view



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to