[
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525865#comment-17525865
]
Jakub Zytka edited comment on CASSANDRA-17519 at 4/21/22 5:36 PM:
------------------------------------------------------------------
I believe that the get/tidy race condition on 4.1 may end up in unexpected
running the obsoletion code before it is due, potentially leading to some local
data loss. Admittedly, I don't have a real-life scenario for that to happen.
The fact that a failure of the assertion that we had on 4.0 and earlier has not
been seen in the wild suggests that the occurrence probability is very low.
Still, I preferred to err on the safe side, and thus the bug has been
categorized as a recoverable loss.
was (Author: jakubzytka):
I believe that the get/tidy race condition may end up in unexpected running the
obsoletion code before it is due, potentially leading to some local data loss.
Admittedly, I don't have a real-life scenario for that to happen.
The fact that a failure of the assertion that we had on 4.0 and earlier has not
been seen in the wild suggests that the occurrence probability is very low.
Still, I preferred to err on the safe side, and thus the bug has been
categorized as a recoverable loss.
> races/leaks in SSTableReader::GlobalTidy
> ----------------------------------------
>
> Key: CASSANDRA-17519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Core
> Reporter: Jakub Zytka
> Assignee: Jakub Zytka
> Priority: Normal
> Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt,
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>
> The second one can be easily hit by adding a small delay at the beginning of
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually
> such failure is what prompted the investigation of GlobalTidy correctness)
> There was an attempt on `trunk` to fix these two races.
> The details are not clear to me, and it all looks quite weird. I might be
> mistaken, but as far as I can see the relevant changes were introduced in:
> [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
> that is piggybacked on a huge change in CASSANDRA-17008, without a separate
> ticket or any sort of qa.
> As far as I can see this attempt changes the first race into a leak, and the
> second race to another race, this time allowing to have multiple GlobalTidy
> objects for the same sstable (and, as a result, a premature running of
> obsoletion code).
> I'll follow up with PRs for relevant branches etc etc
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]