[
https://issues.apache.org/jira/browse/CASSANDRA-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959848#comment-15959848
]
Ariel Weisberg commented on CASSANDRA-13422:
--------------------------------------------
||Code|utests|dtests||
|[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...aweisberg:cassandra-13422-3.11?expand=1]|[utests|https://circleci.com/gh/aweisberg/cassandra/10]|
|[trunk|https://github.com/apache/cassandra/compare/trunk...aweisberg:cassandra-13422-trunk?expand=1]|[utests|https://circleci.com/gh/aweisberg/cassandra/12]|
> CompactionStrategyManager should take write not read lock when handling
> add/remove notifications
> ------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-13422
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13422
> Project: Cassandra
> Issue Type: Bug
> Components: Local Write-Read Paths
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 4.0, 3.11.x
>
>
> {{getNextBackgroundTask}} in various compaction strategies (definitely
> {{LCS}}) rely on checking the result of {{DataTracker.getCompacting()}} to
> avoid accessing data and metadata related to tables that have already head
> their resources released.
> There is a race where this check is unreliable and will claim a table that
> has its resources already released is not compacting resulting in use after
> free.
> [{{LeveledCompactionStrategy.findDroppableSSTable}}|https://github.com/apache/cassandra/blob/c794d2bed7ca1d10e13c4da08a3d45f5c755c1d8/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java#L504]
> for instance has this three part logical && condition where the first check
> is against the compacting set before calling {{worthDroppingTombstones}}
> which fails if the table has been released.
> The order of events is basically that CompactionStrategyManager acquires the
> read lock in getNextBackgroundTask(), then proceeds eventually to
> findDroppableSSTable and acquires a set of SSTables from the manifest. While
> the manifest is thread safe it's not accessed atomically WRT to other
> operations. Once it has acquire the set of tables it acquires the (not
> atomically) the set of compacting SSTables and iterates checking the former
> against the latter.
> Meanwhile other compaction threads are marking tables obsolete or compacted
> and releasing their references. Doing this removes them from {{DataTracker}}
> and publishes a notification to the strategies, but this notification only
> requires the read lock. After the compaction thread has published the
> notifications it eventually marks the table as not compacting in
> {{DataTracker}} or removes it entirely.
> The race is then that the compaction thread generating a new background task
> acquires the sstables from the manifest on the stack. Any table in that set
> that was compacting at that time must remain compacting so that it can be
> skipped. Another compaction thread finishes a compaction and is able to
> remove the table from the manifest and then remove it from the compacting
> set. The thread generating the background task then acquires the list of
> compacting tables which doesn't include the table it is supposed to skip.
> The simple fix appears to be to require threads to acquire the write lock in
> order to publish notifications of tables being added/removed from compaction
> strategies. While holding the write lock it won't be possible for someone to
> see a view of tables in the manifest where tables that are compacting aren't
> compacting in the view
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)