[ 
https://issues.apache.org/jira/browse/CASSANDRA-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892576#comment-16892576
 ] 

Benedict commented on CASSANDRA-15243:
--------------------------------------

See CASSANDRA-14723

> removenode can cause QUORUM write queries to fail
> -------------------------------------------------
>
>                 Key: CASSANDRA-15243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15243
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Tom van der Woerdt
>            Priority: Normal
>
> Looks like nobody found this yet so this may be a ticking time bomb for 
> some... :(
> This happened to me earlier today. On a Cassandra 3.11.4 cluster with three 
> DCs, one DC had three servers fail due to unexpected external circumstances. 
> Replication was NTS configured with 2:2:2.
> Cassandra dealt with the failures just fine - great! However, the nodes 
> failed in a way that would make bringing them back impossible, so I tried to 
> remove them using 'removenode'.
> Suddenly, the application started experiencing a large number of QUORUM write 
> timeouts. My first reflex was to lower the streaming throughput and 
> compaction throughput, since timeouts indicated some overload was happening. 
> No luck, though.
> I tried a bunch of other things to reroute queries away from the affected 
> datacenter, like changing the Severity field on the dynamic snitch. Still, no 
> luck.
> After a while I noticed one strange thing: the WriteTimeoutException listed 
> that five replicas were required, instead of the four you would expect to see 
> in a 2:2:2 replication configuration. I shrugged it off as some weird 
> inconsistency that was probably because of the use of batches.
> Skip ahead a bit, I decided to let the streams run again and just wait the 
> issue out, since nothing I did was working, and maybe just letting the 
> streams finish would resolve this overload. Magically, as soon as the streams 
> finished, the errors stopped.
> ----
> There are two issues here, both in AbstractWriteResponseHandler.java.
> h3. Cassandra sometimes waits for too many replicas on writes
> In 
> [totalBlockFor|https://github.com/apache/cassandra/blob/71cb0616b7710366a8cd364348c864d656dc5542/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L124]
>  Cassandra will *always* include pending nodes in `blockfor`. In the case of 
> a quorum query on a 2:2:2 replication factor, with two replicas in one DC 
> down, this results in a blockfor of 5. If the pending replica is then also 
> down (as can happen in a case where removenode is used and not all 
> destination hosts are up), only 4 of the 5 hosts are available, and quorum 
> queries will never succeed.
> h3. UnavailableException not thrown
> While debugging this, I spent all my time focusing on this issue as if it was 
> a timeout. However, Cassandra was doing queries that could never succeed, 
> because insufficient hosts were available. Throwing an UnavailableException 
> would have been more helpful. The issue here is caused by 
> [assureSufficientLiveNodes|https://github.com/apache/cassandra/blob/71cb0616b7710366a8cd364348c864d656dc5542/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L155]
>  which merely concats the lists of available nodes, and won't consider the 
> special-case behavior of a pending node that's down.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to