[
https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588567#comment-13588567
]
Cristian Opris edited comment on CASSANDRA-5062 at 2/27/13 6:09 PM:
--------------------------------------------------------------------
Note that a proposal may eventually succeed on recovery even if a less than a
quorum has managed to ack it before the leader fails (and the client timed
out). The need for quorum writes is to be able to survive F failures out of
2F+1 replicas. Reads are not quorum, just replica local reads.
Let's say we have 5 replicas, F1 leader, F4 and F5 are ignored here as they
don't matter
{code}
1a F1 -> proposal -> F2
1b F1 <- ack <- F2
2a F1 -> proposal -> F3
2b F1 <- ack <- F3
3a F1 -> OK -> client
3b F1 -> COMMIT -> F2,F3
{code}
If F1 fails immediately after step 1b, F2 would become the leader since he has
the latest seq number. Now only F2 has the proposal but it can continue and
commit it to the other followers.
If it can't get a quorum (maybe it's partitioned in a minority) then it gives
up leadership. When it rejoins the majority, it runs another recovery procedure
that uses epoch numbers to determine if it needs to throw away that proposal.
This is fine since no client has actually been confirmed that the proposal has
been committed. This is detailed in the paper.
was (Author: [email protected]):
Note that a proposal may eventually succeed on recovery even if a less than
a quorum has managed to ack it before the leader fails (and the client timed
out). The need for quorum writes is to be able to survive F failures out of
2F+1 replicas. Reads are not quorum, just replica local reads.
Let's say we have 5 replicas, F1 leader, F4 and F5 are ignored here as they
don't matter
{{
1a. F1 -> proposal -> F2
1b. F1 <- ack <- F2
2a. F1 -> proposal -> F3
2b. F1 <- ack <- F3
3a F1 -> OK -> client
3b F1 -> COMMIT -> F2,F3
}}
If F1 fails immediately after step 1b, F2 would become the leader since he has
the latest seq number. Now only F2 has the proposal but it can continue and
commit it to the other followers.
If it can't get a quorum (maybe it's partitioned in a minority) then it gives
up leadership. When it rejoins the majority, it runs another recovery procedure
that uses epoch numbers to determine if it needs to throw away that proposal.
This is fine since no client has actually been confirmed that the proposal has
been committed. This is detailed in the paper.
> Support CAS
> -----------
>
> Key: CASSANDRA-5062
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
> Project: Cassandra
> Issue Type: New Feature
> Components: API, Core
> Reporter: Jonathan Ellis
> Fix For: 2.0
>
>
> "Strong" consistency is not enough to prevent race conditions. The classic
> example is user account creation: we want to ensure usernames are unique, so
> we only want to signal account creation success if nobody else has created
> the account yet. But naive read-then-write allows clients to race and both
> think they have a green light to create.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira