[
https://issues.apache.org/jira/browse/CASSANDRA-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900090#action_12900090
]
Kelvin Kakugawa commented on CASSANDRA-1072:
--------------------------------------------
Yes, let me rebase and create the patch properly.
The problem w/ #580 is that vector clocks only provide a partial ordering of
the logical updates. However, for distributed commutative operations, we need
to: 1) partition the updates and, 2) be able to idempotently repair updates
between partitions, as well. Bear in mind that we have 2 proposed
implementations of #580: 1) node ids are chosen by the replica that updates the
value, and 2) node ids are chosen by the client connection that updates the
value.
Let's take an example using #580 (replica-based node ids):
The current value + vector clock is:
10 + [(A, 5), (B, 3), (C, 4)]
on all the replicas (A, B, and C). Now, let's say we have 2 clients, X and Y,
that are trying to update the value.
X reads from replica A, then increments the value by 2 and writes it to A (the
clock is pre-constructed on the coordinator):
12 + [(A, 6), (B, 3), (C, 4)]
Y also reads from replica A (before X's update), then increments the value by 3
and writes it to A (the clock is pre-constructed on the coordinator):
13 + [(A, 6), (B, 3), (C, 4)]
Now, replica A sees both variants. However, it doesn't have enough information
to properly reconcile both columns to arrive at 15.
Let's take an example using #580 (connection-based node ids):
The current value + vector clock is:
10 + [(X, 5), (Y, 3)]
on all the replicas (A, B, and C). X and Y were previous client connections
that updated this value. Now, let's say we have a client Z that tries to
update this value twice.
Z reads from replica A, then increments by 2 and writes it to B (clock is
pre-constructed on coordinator):
12 + [(X, 5), (Y, 3), (Z, 1)]
Next, Z reads from replica C, then increments by 3 and writes it to B (clock is
pre-constructed on coordinator):
13 + [(X, 5), (Y, 3), (Z, 1)]
As in the previous example, replica B does not have enough information to
properly reconcile both columns.
Further note, in both examples above, a CL.QUORUM read, then CL.QUORUM write
would not fix the reconciliation problem. #580 does not facilitate the
partitioning of updates, so the reconciler does not know which portion of the
value is "new" when it sees two updates w/ the same version vector. If updates
aren't partitioned, then a distributed lock system, like Cages, would be needed
to enforce exact ordering throughout the distributed system.
If you are doing a set union, like in the Dynamo shopping cart example, then a
partial ordering is sufficient and similar items in the shopping cart can be
collapsed.
> Increment counters
> ------------------
>
> Key: CASSANDRA-1072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1072
> Project: Cassandra
> Issue Type: Sub-task
> Components: Core
> Reporter: Johan Oskarsson
> Assignee: Kelvin Kakugawa
> Attachments: CASSANDRA-1072-2.patch, CASSANDRA-1072-2.patch,
> CASSANDRA-1072.patch, Incrementcountersdesigndoc.pdf
>
>
> Break out the increment counters out of CASSANDRA-580. Classes are shared
> between the two features but without the plain version vector code the
> changeset becomes smaller and more manageable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.