[
https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591890#comment-13591890
]
Cristian Opris edited comment on CASSANDRA-5062 at 3/3/13 11:09 PM:
--------------------------------------------------------------------
OK, I believe what you're proposing is very close to what I am thinking.
Essentially you're using mostRecentCommit timestamp (mrc) to track the paxos
instance, while I am proposing to use a sequence value that is incremented on
local commit.
I expect that in your case as well this epoch number let's call it is different
from proposal
number, which can indeed be a timestamp (timeuuid)
It seems this epoch doesn't have to be sequential so timestamp could work. (I
would still go with
a sequence just not to depend on the clock at all, but it's not necessary)
I reworked the example above with more detail, and seems correct:
{code}
R1 R2 R3
Ct0 Ct0 Ct0 //initial state at t0
Ptn(mrc=t0) <- //R3 makes a proposal numbered tn with most
recent commited t0
-- ok --> //R2 promises
Atn< Atn < //accept at Tn > t0
Atn -> Ctn //R3 commits Ctn, mrc=tn, accept is cleared
---> Ptn+m(mrc=t0) > //R1 makes a proposal tn+m with mRC=t0, last
it knows of
<--- nack (Ctn) //R3 rejects since stale mRC; send Ctn
directly for R1 to learn
Ctn
---> Ptn+m(mrc=tn) //propose again at mrc=tn
<- ok ---------------- //R3 promises since mrc up to date
>Atn+m >Atn+m //R3 accepts new value at tn+m > tn
>Ctn+m
{code}
State:
R1=(Ctn+m), R2=(Ct0,Atn), R3=(Ctn,Atn+m)
Now I think this is pretty much like the variant with version counter above.
To do a consistent read, the read may have to perform the completion of the
paxos round for Atn+m
but it's guaranteed to resolve to Ctn+m whatever quorum it reads.
was (Author: [email protected]):
OK, I believe what you're proposing is very close to what I am thinking.
Essentially you're using mostRecentCommit timestamp (mrc) to track the paxos
instance, while I am proposing to use a sequence value that is incremented on
local commit.
I expect that in your case as well this epoch number let's call it is different
from proposal
number, which can indeed be a timestamp (timeuuid)
It seems this epoch doesn't have to be sequential so timestamp could work. (I
would still go with
a sequence just not to depend on the clock at all, but it's not necessary)
I reworked the example above with more detail, and seems correct:
{code}
R1 R2 R3
Ct0 Ct0 Ct0 //initial state at t0
Ptn(epoch=t0) <- //R3 makes a proposal numbered tn with mRC=t0
promise(Ptn) --> //R2 promises
Atn< Atn < //accept at Tn > t0
Atn -> Ctn //R3 commits Ctn, mrc=tn, accept is cleared
---> Ptn+m(mrc=t0) > //R1 makes a proposal tn+m with mRC=t0, last
it knows of
<--- nack (Ctn) //R3 rejects since stale mRC; send Ctn
directly for R1 to learn
Ctn
---> Ptn+m(mrc=tn) //propose again at mRC=tn
<- ok ---------------- //R3 promises
>Atn+m >Atn+m //R3 accepts new value at tn+m > tn, this is
>now valid
>Ctn+m
{code}
State:
R1=(Ctn+m), R2=(Ct0,Atn), R3=(Ctn,Atn+m)
Now I think this is pretty much like the variant with version counter above.
To do a consistent read, the read may have to perform the completion of the
paxos round for Atn+m
but it's guaranteed to resolve to Ctn+m whatever quorum it reads.
> Support CAS
> -----------
>
> Key: CASSANDRA-5062
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
> Project: Cassandra
> Issue Type: New Feature
> Components: API, Core
> Reporter: Jonathan Ellis
> Fix For: 2.0
>
> Attachments: half-baked commit 1.jpg, half-baked commit 2.jpg,
> half-baked commit 3.jpg
>
>
> "Strong" consistency is not enough to prevent race conditions. The classic
> example is user account creation: we want to ensure usernames are unique, so
> we only want to signal account creation success if nobody else has created
> the account yet. But naive read-then-write allows clients to race and both
> think they have a green light to create.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira