[
https://issues.apache.org/jira/browse/CASSANDRA-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-8346:
----------------------------------------
Attachment: 8346.txt
I don't think there is much fix we can do (reading from the pending endpoints
could also return stale data since those aren't yet up to date), so I think the
simplest fix is to throw an UnavailableException if we have more than 2 pending
endpoints. Attaching patch to do that.
> Paxos operation can use stale data during multiple range movements
> ------------------------------------------------------------------
>
> Key: CASSANDRA-8346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8346
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Fix For: 2.0.12
>
> Attachments: 8346.txt
>
>
> Paxos operations correctly account for pending ranges for all operation
> pertaining to the Paxos state, but those pending ranges are not taken into
> account when reading the data to check for the conditions or during a serial
> read. It's thus possible to break the LWT guarantees by reading a stale
> value. This require 2 node movements (on the same token range) to be a
> problem though.
> Basically, we have {{RF}} replicas + {{P}} pending nodes. For the Paxos
> prepare/propose phases, the number of required participants (the "Paxos
> QUORUM") is {{(RF + P + 1) / 2}} ({{SP.getPaxosParticipants}}), but the read
> done to check conditions or for serial reads is done at a "normal" QUORUM (or
> LOCAL_QUORUM), and so a weaker {{(RF + 1) / 2}}. We have a problem if it's
> possible that said read can read only from nodes that were not part of the
> paxos participants, and so we have a problem if:
> {noformat}
> "normal quorum" == (RF + 1) / 2 <= (RF + P) - ((RF + P + 1) / 2) ==
> "participants considered - blocked for"
> {noformat}
> We're good if {{P = 0}} or {{P = 1}} since this inequality gives us
> respectively {{RF + 1 <= RF - 1}} and {{RF + 1 <= RF}}, both of which are
> impossible. But at {{P = 2}} (2 pending nodes), this inequality is equivalent
> to {{RF <= RF}} and so we might read stale data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)