[jira] [Commented] (CASSANDRA-15243) removenode can cause QUORUM write queries to fail

Benedict (JIRA) Fri, 26 Jul 2019 02:28:06 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893665#comment-16893665
 ]


Benedict commented on CASSANDRA-15243:
--------------------------------------

It's attempted to be implied by the reference to write availability (the 
persistent loss of which causes an outage).  It's logically the case that range 
movements decrease write availability, and increase the likelihood of failures 
causing an outage.  The problem with the present approach, as you've 
encountered, is that during these range movements we cannot necessarily 
tolerate the same level of failure that we can without range movements in 
flight, and that it can lead to surprising outages.

You're right that the other ticket doesn't explicitly mention the problem of 
behaviours and failures that would have been isolated to one DC "leaking" to 
the whole cluster, and it is probably worth mentioning it.

> removenode can cause QUORUM write queries to fail
> -------------------------------------------------
>
>                 Key: CASSANDRA-15243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15243
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Tom van der Woerdt
>            Priority: Normal
>
> Looks like nobody found this yet so this may be a ticking time bomb for 
> some... :(
> This happened to me earlier today. On a Cassandra 3.11.4 cluster with three 
> DCs, one DC had three servers fail due to unexpected external circumstances. 
> Replication was NTS configured with 2:2:2.
> Cassandra dealt with the failures just fine - great! However, the nodes 
> failed in a way that would make bringing them back impossible, so I tried to 
> remove them using 'removenode'.
> Suddenly, the application started experiencing a large number of QUORUM write 
> timeouts. My first reflex was to lower the streaming throughput and 
> compaction throughput, since timeouts indicated some overload was happening. 
> No luck, though.
> I tried a bunch of other things to reroute queries away from the affected 
> datacenter, like changing the Severity field on the dynamic snitch. Still, no 
> luck.
> After a while I noticed one strange thing: the WriteTimeoutException listed 
> that five replicas were required, instead of the four you would expect to see 
> in a 2:2:2 replication configuration. I shrugged it off as some weird 
> inconsistency that was probably because of the use of batches.
> Skip ahead a bit, I decided to let the streams run again and just wait the 
> issue out, since nothing I did was working, and maybe just letting the 
> streams finish would resolve this overload. Magically, as soon as the streams 
> finished, the errors stopped.
> ----
> There are two issues here, both in AbstractWriteResponseHandler.java.
> h3. Cassandra sometimes waits for too many replicas on writes
> In 
> [totalBlockFor|https://github.com/apache/cassandra/blob/71cb0616b7710366a8cd364348c864d656dc5542/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L124]
>  Cassandra will *always* include pending nodes in `blockfor`. In the case of 
> a quorum query on a 2:2:2 replication factor, with two replicas in one DC 
> down, this results in a blockfor of 5. If the pending replica is then also 
> down (as can happen in a case where removenode is used and not all 
> destination hosts are up), only 4 of the 5 hosts are available, and quorum 
> queries will never succeed.
> h3. UnavailableException not thrown
> While debugging this, I spent all my time focusing on this issue as if it was 
> a timeout. However, Cassandra was doing queries that could never succeed, 
> because insufficient hosts were available. Throwing an UnavailableException 
> would have been more helpful. The issue here is caused by 
> [assureSufficientLiveNodes|https://github.com/apache/cassandra/blob/71cb0616b7710366a8cd364348c864d656dc5542/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L155]
>  which merely concats the lists of available nodes, and won't consider the 
> special-case behavior of a pending node that's down.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15243) removenode can cause QUORUM write queries to fail

Reply via email to