[
https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588400#comment-13588400
]
Jonathan Ellis edited comment on CASSANDRA-5062 at 2/27/13 2:51 PM:
--------------------------------------------------------------------
bq. probably the coordinator should hint something when he don't get the
commit-ack from the 2 replicas that died
This is racy, though; if the coordinator also dies, then we still lose.
FWIW, Spinnaker's solution is actually pretty dicey here too: the leader does
2PC, and if the leader does not get a majority of acks back to it's proposal,
it will return fail the op. But, it doesn't actually abort or revert the
proposal on the followers. (And if it tried, it would still be open to a race,
where it fails before aborting, leaving some proposals extant.)
Then, when a new leader is elected, it replays the proposals it has not yet
committed. So a proposal that originally failed, and was returned as such to
the client, could end up committed after failover. Which is, at best,
unexpected, and in the CAS case I'm pretty sure is outright broken.
I think Sergio's proposal has a similar problem: if the leader reports success
to the client after local commit, but before it has been committed to the
followers, we could either (1) lose the commit on failover if followers are
pessimistic, or (2) commit data that we originally reported failed as in
Spinnaker if we are optimistic. On the other hand if the leader tries to wait
for commit ack from followers before reporting to the client it could block
indefinitely during a partition, so that is no solution either.
was (Author: jbellis):
bq. probably the coordinator should hint something when he don't get the
commit-ack from the 2 replicas that died
This is racy, though; if the coordinator also dies, then we still lose.
FWIW, Spinnaker's solution is actually pretty dicey here too: the leader does
2PC, and if the leader does not get a majority of acks back to it's proposal,
it will return fail the op. But, it doesn't actually abort or revert the
proposal on the followers. (And if it tried, it would still be open to a race,
where it fails before aborting, leaving some proposals extant.)
Then, when a new leader is elected, it replays the proposals it has not yet
committed. So a proposal that originally failed, and was returned as such to
the client, could succeed after failover.
I think Sergio's proposal has a similar problem: if the leader reports success
to the client after local commit, but before it has been committed to the
followers, we could either (1) lose the commit on failover if followers are
pessimistic, or (2) commit data that we originally reported failed as in
Spinnaker if we are optimistic. On the other hand if the leader tries to wait
for commit ack from followers before reporting to the client it could block
indefinitely during a partition, so that is no solution either.
> Support CAS
> -----------
>
> Key: CASSANDRA-5062
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
> Project: Cassandra
> Issue Type: New Feature
> Components: API, Core
> Reporter: Jonathan Ellis
> Fix For: 2.0
>
>
> "Strong" consistency is not enough to prevent race conditions. The classic
> example is user account creation: we want to ensure usernames are unique, so
> we only want to signal account creation success if nobody else has created
> the account yet. But naive read-then-write allows clients to race and both
> think they have a green light to create.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira