[ 
https://issues.apache.org/jira/browse/CASSANDRA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049855#comment-13049855
 ] 

Jonathan Ellis commented on CASSANDRA-2369:
-------------------------------------------

No, it's worse than that.

Let me give an example of a simple multi-node, multi-DC cluster: nodes A and M 
in DC1, nodes B and N in DC2.  So node A, M, B, and N have keys in ranges 
(M-A], (A, M], (N, B], (B, N], respectively.

If I write a row K with NTS {DC1: 1, DC2:2}, then I know it will be on nodes M 
and N. So far so good.

What if I now repair node M? It knows it has to compare its data for range (A, 
B] with that data on node B, and range (B, M] with that data on node N.  So it 
builds a merkle tree for each range, and requests that B and N do so as well, 
then they exchange trees to see if things are in sync.

How does this change if we introduce this partitioner? M can no longer assume 
that keys it has for range (A, B] should also be replicated to node M, and vice 
versa.  You would have to build a separate tree for each replica, i.e. instead 
of just a tree for (A, B], each replica would need to build a tree for (A, 
B]-that-belongs-on-M, and another tree for (A, B)-that-belongs-on-B, and so 
forth for as many possible replicas as exist.

There is a similar problem on bootstrap and node movement.  Instead of asking a 
single replica to stream data from the range a new node is assuming, it will 
have to ask _all_ replicas that may have rows for that range to make sure it 
doesn't miss any.

> support replication decisions per-key
> -------------------------------------
>
>                 Key: CASSANDRA-2369
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2369
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0
>
>
> Currently the replicationstrategy gets a token and a keyspace with which to 
> decide how to place replicas.  for per-row replication this is insufficient 
> because tokenization is lossy (CASSANDRA-1034).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to