[
https://issues.apache.org/jira/browse/CASSANDRA-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240697#comment-17240697
]
Paulo Motta commented on CASSANDRA-16271:
-----------------------------------------
Thanks for the patch Sam!
I think we don't need to check pending replicas on
{{assureSufficientLiveReplicas}}, they are only needed to ensure one extra
replica is acknowledged on {{CL.blockForWrite}} but not to throw
{{UnavailableException}} when not enough natural replicas are available. I did
this approach for trunk [on this
commit|https://github.com/pauloricardomg/cassandra/commit/4027b8614ffd898b83e547fa0e49352a8aa8a739]
and the tests are passing, please let me know what do you think.
I think this should simplify the 3.X patches as we don't need to filter live
replicas on the writehandlers.
I really like the unit tests, but I personally prefer having them as individual
unit tests rather a single one testing many cases (or maybe grouped in smaller
units), since it makes it easier to spot which test fails on CI. Feel free to
keep it if you prefer it this way. Also, can you add a regression dtest for the
scenarios in the ticket description?
> Writes timeout instead of failing on cluster with CL-1 replicas available
> during replace
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-16271
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16271
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination
> Reporter: Krishna Vadali
> Assignee: Sam Tunnicliffe
> Priority: Normal
> Attachments: sleep_before_replace.diff
>
>
> Writes timeout instead of failing on cluster with CL-1 replicas available
> during replace node operation.
> With Consistency Level ALL, we are observing Timeout exceptions during writes
> when (RF - 1) nodes are available in the cluster with one replace-node
> operation running. The coordinator is expecting RF + 1 responses, while there
> are only RF nodes (RF-1 nodes in UN and 1 node in UJ) are available in the
> cluster, hence timing out.
> The same problem happens on a keyspace with RF=1, CL=ONE and one replica
> being replaced. Also RF=3, CL=QUORUM, one replica down and another being
> replaced.
> I believe the expected behavior is that the write should fail with
> UnavailableException since there are not enough NORMAL replicas to fulfill
> the request.
> h4. *Steps to reproduce:*
> Run a 3 node test cluster (call the nodes node1 (127.0.0.1), node2
> (127.0.0.2), node3 (127.0.0.3)):
> {code:java}
> ccm create test -v 3.11.3 -n 3 -s
> {code}
> Create test keyspaces with RF = 3 and RF = 1 respectively:
> {code:java}
> create keyspace rf3 with replication = \{'class': 'SimpleStrategy',
> 'replication_factor': 3};
> create keyspace rf1 with replication = \{'class': 'SimpleStrategy',
> 'replication_factor': 1};
> {code}
> Create a table test in both the keyspaces:
> {code:java}
> create table rf3.test ( pk int primary KEY, value int);
> create table rf1.test ( pk int primary KEY, value int);
> {code}
> Stop node node2:
> {code:java}
> ccm node2 stop
> {code}
> Create node node4:
> {code:java}
> ccm add node4 -i 127.0.0.4
> {code}
> Enable auto_bootstrap
> {code:java}
> ccm node4 updateconf 'auto_bootstrap: true'
> {code}
> Ensure node4 does not have itself in its seeds list.
> Run a replace node to replace node2 (address 127.0.0.2 corresponds to node
> node2)
> {code:java}
> ccm node4 start --jvm_arg="-Dcassandra.replace_address=127.0.0.2"
> {code}
> When the replace node is running, perform write/reads with CONSISTENCY ALL,
> we observed TimeoutException.
> {code:java}
> SET CONSISTENCY ALL:SET CONSISTENCY ALL:
> cqlsh> insert into rf3.test (pk, value) values (16, 7);
> WriteTimeout: Error from server: code=1100 [Coordinator node timed out
> waiting for replica nodes' responses] message="Operation timed out - received
> only 3 responses." info=\{'received_responses': 3, 'required_responses': 4,
> 'consistency': 'ALL'}{code}
> {code:java}
> cqlsh> CONSISTENCY ONE;
> cqlsh> insert into rf1.test (pk, value) VALUES(5, 1);
> WriteTimeout: Error from server: code=1100 [Coordinator node timed out
> waiting for replica nodes' responses] message="Operation timed out - received
> only 1 responses." info=\{'received_responses': 1, 'required_responses': 2,
> 'consistency': 'ONE'}
> {code}
> Cluster State:
> {code:java}
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 127.0.0.1 70.45 KiB 1 100.0%
> 4f652b22-045b-493b-8722-fb5f7e1723ce rack1
> UN 127.0.0.3 70.43 KiB 1 100.0%
> a0dcd677-bdb3-4947-b9a7-14f3686a709f rack1
> UJ 127.0.0.4 137.47 KiB 1 ?
> e3d794f1-081e-4aba-94f2-31950c713846 rack1
> {code}
> Note:
> We introduced sleep during replace operation in order to simulate do our
> experiments. We attached code diff that does it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]