[jira] [Commented] (CASSANDRA-2774) one way to make counter delete work better

Yang Yang (JIRA) Wed, 15 Jun 2011 09:54:57 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049874#comment-13049874
 ]


Yang Yang commented on CASSANDRA-2774:
--------------------------------------

you are right "it just cannot return 1 (at) *that time* ", 0 or 2 is the value 
not stable that the system had from some past
snapshot in time.

but it will eventually come to answer 1:

since our edge case above assumes that B has not got the deletion yet, the 
leader in the second increment can not be A, cuz otherwise B must have got the 
deletion from A, since on A the increment comes later. so B was the leader in 
the second increment.


for C, it now has new epoch,  let's say A's second increment reaches C (through 
repair, since A is not the leader in second increment), this increment has new 
epoch, so it will be accepted by C; if B's second increment reaches C, it 
belongs to the old epoch, it will be rejected.

for B, it is still on the old epoch,  after the second increment, B's count is 
2 of the old epoch. but when A's increment goes to B through repair, or is 
reconciled in read with B, the result is going to be 1. if C's deletion goes to 
B, B is going to be brought more up to date to a value of 0 of new epoch. 



the above analysis does not go through all possible scenarios, but to give a 
definitive proof of the conjecture that "all nodes return *the* ordering given 
by client , in case of quorum read/write", I need to think more. 

but as I stated in my last comment, at least we can be sure that the new 
approach guarantees *some* common agreement eventually. it would be nice if we 
achieve *the* agreement in case of quorum, but that's not my  main argument

> one way to make counter delete work better
> ------------------------------------------
>
>                 Key: CASSANDRA-2774
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2774
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 0.8.0
>            Reporter: Yang Yang
>         Attachments: counter_delete.diff
>
>
> current Counter does not work with delete, because different merging order of 
> sstables would produces different result, for example:
> add 1
> delete 
> add 2
> if the merging happens by 1-2, (1,2)--3  order, the result we see will be 2
> if merging is: 1--3, (1,3)--2, the result will be 3.
> the issue is that delete now can not separate out previous adds and adds 
> later than the delete. supposedly a delete is to create a completely new 
> incarnation of the counter, or a new "lifetime", or "epoch". the new approach 
> utilizes the concept of "epoch number", so that each delete bumps up the 
> epoch number. since each write is replicated (replicate on write is almost 
> always enabled in practice, if this is a concern, we could further force ROW 
> in case of delete ), so the epoch number is global to a replica set
> changes are attached, existing tests pass fine, some tests are modified since 
> the semantic is changed a bit. some cql tests do not pass in the original 
> 0.8.0 source, that's not the fault of this change.
> see details at 
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3cbanlktikqcglsnwtt-9hvqpseoo7sf58...@mail.gmail.com%3E
> the goal of this is to make delete work ( at least with consistent behavior, 
> yes in case of long network partition, the behavior is not ideal, but it's 
> consistent with the definition of logical clock), so that we could have 
> expiring Counters

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2774) one way to make counter delete work better

Reply via email to