[ 
https://issues.apache.org/jira/browse/CASSANDRA-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900090#action_12900090
 ] 

Kelvin Kakugawa commented on CASSANDRA-1072:
--------------------------------------------

Yes, let me rebase and create the patch properly.


The problem w/ #580 is that vector clocks only provide a partial ordering of 
the logical updates.  However, for distributed commutative operations, we need 
to: 1) partition the updates and, 2) be able to idempotently repair updates 
between partitions, as well.  Bear in mind that we have 2 proposed 
implementations of #580: 1) node ids are chosen by the replica that updates the 
value, and 2) node ids are chosen by the client connection that updates the 
value.


Let's take an example using #580 (replica-based node ids):
The current value + vector clock is:
     10 + [(A, 5), (B, 3), (C, 4)]

on all the replicas (A, B, and C).  Now, let's say we have 2 clients, X and Y, 
that are trying to update the value.
X reads from replica A, then increments the value by 2 and writes it to A (the 
clock is pre-constructed on the coordinator):
    12 + [(A, 6), (B, 3), (C, 4)]

Y also reads from replica A (before X's update), then increments the value by 3 
and writes it to A (the clock is pre-constructed on the coordinator):
    13 + [(A, 6), (B, 3), (C, 4)]

Now, replica A sees both variants.  However, it doesn't have enough information 
to properly reconcile both columns to arrive at 15.


Let's take an example using #580 (connection-based node ids):
The current value + vector clock is:
     10 + [(X, 5), (Y, 3)]

on all the replicas (A, B, and C).  X and Y were previous client connections 
that updated this value.  Now, let's say we have a client Z that tries to 
update this value twice.
Z reads from replica A, then increments by 2 and writes it to B (clock is 
pre-constructed on coordinator):
    12 + [(X, 5), (Y, 3), (Z, 1)]

Next, Z reads from replica C, then increments by 3 and writes it to B (clock is 
pre-constructed on coordinator):
    13 + [(X, 5), (Y, 3), (Z, 1)]

As in the previous example, replica B does not have enough information to 
properly reconcile both columns.


Further note, in both examples above, a CL.QUORUM read, then CL.QUORUM write 
would not fix the reconciliation problem.  #580 does not facilitate the 
partitioning of updates, so the reconciler does not know which portion of the 
value is "new" when it sees two updates w/ the same version vector.  If updates 
aren't partitioned, then a distributed lock system, like Cages, would be needed 
to enforce exact ordering throughout the distributed system.


If you are doing a set union, like in the Dynamo shopping cart example, then a 
partial ordering is sufficient and similar items in the shopping cart can be 
collapsed.

> Increment counters
> ------------------
>
>                 Key: CASSANDRA-1072
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1072
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Johan Oskarsson
>            Assignee: Kelvin Kakugawa
>         Attachments: CASSANDRA-1072-2.patch, CASSANDRA-1072-2.patch, 
> CASSANDRA-1072.patch, Incrementcountersdesigndoc.pdf
>
>
> Break out the increment counters out of CASSANDRA-580. Classes are shared 
> between the two features but without the plain version vector code the 
> changeset becomes smaller and more manageable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to