[
https://issues.apache.org/jira/browse/CASSANDRA-19617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benedict Elliott Smith updated CASSANDRA-19617:
-----------------------------------------------
Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable
Corruption / Loss(12986)
Complexity: Byzantine
Discovered By: Diff Testing
Fix Version/s: 4.1.x
5.0-rc
Severity: Critical
Assignee: Benedict Elliott Smith
Status: Open (was: Triage Needed)
> Paxos may re-distribute stale commits that predate a collectable tombstone
> --------------------------------------------------------------------------
>
> Key: CASSANDRA-19617
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19617
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination
> Reporter: Benedict Elliott Smith
> Assignee: Benedict Elliott Smith
> Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
>
> Note: this bug only affects {{paxos_state_purging: {gc_grace, repaired}}},
> i.e. those introduced alongside Paxos v2.
> There are two problems:
> 1) Purging is applied only on compaction, not on load, which can lead to very
> old commits being resurfaced in certain circumstances
> 2) PaxosPrepare does not filter commits based on paxos repair low bound
> This permits surprising situations to arise, where some replicas purge a
> stale commit _and all newer commits_, but due to compaction peculiarities
> some other replica may purge only the newer commits, leaving a stale commit
> in some compaction "purgatory"\[1] to be returned to reads indefinitely.
> So long as there are no newer commits, the paxos coordinator will see this
> commit is not universally known and redistribute it - no matter how old it
> is. This can permit an insert to be reapplied after GC grace has elapsed and
> the tombstone has been collected.
> For proposals this is not a problem, as we correctly filter proposals based
> on the last paxos repair time. This also does not affect clusters with the
> legacy (and default) paxos state purging using TTL. Problem (1) only applies
> also to the new {{gc_grace}} compatibility mode for purging.
> \[1] Compaction purgatory can arise for instance because paxos purging allows
> whole sstables to be erased quite effectively, and if this is able to
> ordinarily prevent sstables being promoted to L1, then if for some abnormal
> reason sstables reach L1 (e.g. repairs being disabled for some time), those
> that collect may remain uncompacted for an extended period without purging
> being applied.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]