[ 
https://issues.apache.org/jira/browse/CASSANDRA-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280680#comment-15280680
 ] 

Alex Petrov commented on CASSANDRA-7592:
----------------------------------------

I've been able to reproduce the issue with several simple steps:

  * Populate a cluster with 3 nodes, set RF to 3 and insert some data into the 
cluster.
  * Disable gossip on {{node1}}.
  * Bring in {{node4}}. {{node1}} will be unaware of the ring change.
  * Run {{select *}}

I've also made a simple prototype that would check whether requested token or 
range belong to (or intersect with) the current node, when coordinator sends 
requests to replicas, and was able to fail the coordinator request for that 
node (with a different replication factor request should however succeed since 
it still can connect to enough replicas).

The coordinator requests have to get a bit smarter in a way that coordinator 
has to send reads and mutations to previous owner for the "ring delay". 
With the active announcement, my only question is in case when coordinator can 
not connect to other nodes, it won't receive the message either actively or 
passively.

> Ownership changes can violate consistency
> -----------------------------------------
>
>                 Key: CASSANDRA-7592
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7592
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Richard Low
>
> CASSANDRA-2434 goes a long way to avoiding consistency violations when 
> growing a cluster. However, there is still a window when consistency can be 
> violated when switching ownership of a range.
> Suppose you have replication factor 3 and all reads and writes at quorum. The 
> first part of the ring looks like this:
> Z: 0
> A: 100
> B: 200
> C: 300
> Choose two random coordinators, C1 and C2. Then you bootstrap node X at token 
> 50.
> Consider the token range 0-50. Before bootstrap, this is stored on A, B, C. 
> During bootstrap, writes go to X, A, B, C (and must succeed on 3) and reads 
> choose two from A, B, C. After bootstrap, the range is on X, A, B.
> When the bootstrap completes, suppose C1 processes the ownership change at t1 
> and C2 at t4. Then the following can give an inconsistency:
> t1: C1 switches ownership.
> t2: C1 performs write, so sends write to X, A, B. A is busy and drops the 
> write, but it succeeds because X and B return.
> t3: C2 performs a read. It hasn’t done the switch and chooses A and C. 
> Neither got the write at t2 so null is returned.
> t4: C2 switches ownership.
> This could be solved by continuing writes to the old replica for some time 
> (maybe ring delay) after the ownership changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to