[ 
https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591890#comment-13591890
 ] 

Cristian Opris edited comment on CASSANDRA-5062 at 3/3/13 11:09 PM:
--------------------------------------------------------------------

OK, I believe what you're proposing is very close to what I am thinking.

Essentially you're using mostRecentCommit timestamp (mrc) to track the paxos 
instance, while I am proposing to use a sequence value that is incremented on 
local commit.

I expect that in your case as well this epoch number let's call it is different 
from proposal 
number, which can indeed be a timestamp (timeuuid)

It seems this epoch doesn't have to be sequential so timestamp could work. (I 
would still go with
a sequence just not to depend on the clock at all, but it's not necessary)

I reworked the example above with more detail, and seems correct:

{code}

R1           R2        R3
  Ct0          Ct0       Ct0       //initial state at t0
            Ptn(mrc=t0) <-         //R3 makes a proposal numbered tn with most 
recent commited t0
             --   ok   -->         //R2 promises 
             Atn<     Atn <        //accept at Tn > t0
                      Atn -> Ctn   //R3 commits Ctn, mrc=tn, accept is cleared
    ---> Ptn+m(mrc=t0) >           //R1 makes a proposal tn+m with mRC=t0, last 
it knows of
    <--- nack (Ctn)                //R3 rejects since stale mRC; send Ctn 
directly for R1 to learn
 Ctn
    ---> Ptn+m(mrc=tn)             //propose again at mrc=tn                    
   
    <- ok ----------------         //R3 promises since mrc up to date
>Atn+m                >Atn+m       //R3 accepts new value at tn+m > tn
>Ctn+m                    
{code}

State:
R1=(Ctn+m), R2=(Ct0,Atn), R3=(Ctn,Atn+m)

Now I think this is pretty much like the variant with version counter above.

To do a consistent read, the read may have to perform the completion of the 
paxos round for Atn+m
but it's guaranteed to resolve to Ctn+m whatever quorum it reads.




                
      was (Author: [email protected]):
    OK, I believe what you're proposing is very close to what I am thinking.

Essentially you're using mostRecentCommit timestamp (mrc) to track the paxos 
instance, while I am proposing to use a sequence value that is incremented on 
local commit.

I expect that in your case as well this epoch number let's call it is different 
from proposal 
number, which can indeed be a timestamp (timeuuid)

It seems this epoch doesn't have to be sequential so timestamp could work. (I 
would still go with
a sequence just not to depend on the clock at all, but it's not necessary)

I reworked the example above with more detail, and seems correct:

{code}

R1           R2        R3
  Ct0          Ct0       Ct0       //initial state at t0
            Ptn(epoch=t0) <-       //R3 makes a proposal numbered tn with mRC=t0
             promise(Ptn) -->      //R2 promises 
             Atn<     Atn <        //accept at Tn > t0
                      Atn -> Ctn   //R3 commits Ctn, mrc=tn, accept is cleared
    ---> Ptn+m(mrc=t0) >           //R1 makes a proposal tn+m with mRC=t0, last 
it knows of
    <--- nack (Ctn)                //R3 rejects since stale mRC; send Ctn 
directly for R1 to learn
 Ctn
    ---> Ptn+m(mrc=tn)             //propose again at mRC=tn                    
   
    <- ok ----------------         //R3 promises          
>Atn+m                >Atn+m       //R3 accepts new value at tn+m > tn, this is 
>now valid
>Ctn+m                    
{code}

State:
R1=(Ctn+m), R2=(Ct0,Atn), R3=(Ctn,Atn+m)

Now I think this is pretty much like the variant with version counter above.

To do a consistent read, the read may have to perform the completion of the 
paxos round for Atn+m
but it's guaranteed to resolve to Ctn+m whatever quorum it reads.




                  
> Support CAS
> -----------
>
>                 Key: CASSANDRA-5062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 2.0
>
>         Attachments: half-baked commit 1.jpg, half-baked commit 2.jpg, 
> half-baked commit 3.jpg
>
>
> "Strong" consistency is not enough to prevent race conditions.  The classic 
> example is user account creation: we want to ensure usernames are unique, so 
> we only want to signal account creation success if nobody else has created 
> the account yet.  But naive read-then-write allows clients to race and both 
> think they have a green light to create.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to