[ https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626715#comment-13626715 ]
Jonathan Ellis commented on CASSANDRA-5062: ------------------------------------------- bq. We can use one true paxos state per row (which is the alternative I'm suggesting) but still use a 'hashcode % 1024' number of locks for actually protecting the read/write to the paxos system table. That doesn't solve the problem I'm thinking of. Suppose Proposer1 (P1) asks for a promise of ballot 1 (B1). Then P2 asks for a promise of B2. Meanwhile, P1 proposes (B1, V1). If the PaxosState.propose is not locked vs prepare for that row, then it could promise B2 but subsequently accept (B1, V1) which is a violation of the protocol. bq. we'll still wait rpcTimeout (10 seconds by default!) I'm okay with this since nodes going down mid-request before FD notices is a pretty rare case. I further note that anything that starts with "only wait for quorum [even if there are other live nodes around] means that it's more likely that the replicas we don't wait for will stay out of date longer than they "should" for the MRC check. So I see this as two alternatives, each with drawbacks, rather than one strictly better than the other. And I prefer the alternative that already has code written. :) bq. I'm (strongly) convinced that throwing this UAE is wrong in the first place Huh? If too many replicas are down and we don't get a promise, your logic says we should retry with a new (higher) ballot, in which case we will get UAE almost all the time, since few node failure modes will result in a recovery in under 100ms. > Support CAS > ----------- > > Key: CASSANDRA-5062 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5062 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Fix For: 2.0 > > Attachments: half-baked commit 1.jpg, half-baked commit 2.jpg, > half-baked commit 3.jpg > > > "Strong" consistency is not enough to prevent race conditions. The classic > example is user account creation: we want to ensure usernames are unique, so > we only want to signal account creation success if nobody else has created > the account yet. But naive read-then-write allows clients to race and both > think they have a green light to create. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira