[
https://issues.apache.org/jira/browse/CASSANDRA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049855#comment-13049855
]
Jonathan Ellis commented on CASSANDRA-2369:
-------------------------------------------
No, it's worse than that.
Let me give an example of a simple multi-node, multi-DC cluster: nodes A and M
in DC1, nodes B and N in DC2. So node A, M, B, and N have keys in ranges
(M-A], (A, M], (N, B], (B, N], respectively.
If I write a row K with NTS {DC1: 1, DC2:2}, then I know it will be on nodes M
and N. So far so good.
What if I now repair node M? It knows it has to compare its data for range (A,
B] with that data on node B, and range (B, M] with that data on node N. So it
builds a merkle tree for each range, and requests that B and N do so as well,
then they exchange trees to see if things are in sync.
How does this change if we introduce this partitioner? M can no longer assume
that keys it has for range (A, B] should also be replicated to node M, and vice
versa. You would have to build a separate tree for each replica, i.e. instead
of just a tree for (A, B], each replica would need to build a tree for (A,
B]-that-belongs-on-M, and another tree for (A, B)-that-belongs-on-B, and so
forth for as many possible replicas as exist.
There is a similar problem on bootstrap and node movement. Instead of asking a
single replica to stream data from the range a new node is assuming, it will
have to ask _all_ replicas that may have rows for that range to make sure it
doesn't miss any.
> support replication decisions per-key
> -------------------------------------
>
> Key: CASSANDRA-2369
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2369
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jonathan Ellis
> Assignee: Vijay
> Priority: Minor
> Fix For: 1.0
>
>
> Currently the replicationstrategy gets a token and a keyspace with which to
> decide how to place replicas. for per-row replication this is insufficient
> because tokenization is lossy (CASSANDRA-1034).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira