[
https://issues.apache.org/jira/browse/CASSANDRA-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tom van der Woerdt updated CASSANDRA-15243:
-------------------------------------------
Description:
Looks like nobody found this yet so this may be a ticking time bomb for some...
:(
This happened to me earlier today. On a Cassandra 3.11.4 cluster with three
DCs, one DC had three servers fail due to unexpected external circumstances.
Replication was NTS configured with 2:2:2.
Cassandra dealt with the failures just fine - great! However, the nodes failed
in a way that would make bringing them back impossible, so I tried to remove
them using 'removenode'.
Suddenly, the application started experiencing a large number of QUORUM write
timeouts. My first reflex was to lower the streaming throughput and compaction
throughput, since timeouts indicated some overload was happening. No luck,
though.
I tried a bunch of other things to reroute queries away from the affected
datacenter, like changing the Severity field on the dynamic snitch. Still, no
luck.
After a while I noticed one strange thing: the WriteTimeoutException listed
that five replicas were required, instead of the four you would expect to see
in a 2:2:2 replication configuration. I shrugged it off as some weird
inconsistency that was probably because of the use of batches.
Skip ahead a bit, I decided to let the streams run again and just wait the
issue out, since nothing I did was working, and maybe just letting the streams
finish would resolve this overload. Magically, as soon as the streams finished,
the errors stopped.
----
There are two issues here, both in AbstractWriteResponseHandler.java.
h3. Cassandra sometimes waits for too many replicas on writes
In
[totalBlockFor|https://github.com/apache/cassandra/blob/71cb0616b7710366a8cd364348c864d656dc5542/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L124]
Cassandra will *always* include pending nodes in `blockfor`. In the case of a
quorum query on a 2:2:2 replication factor, with two replicas in one DC down,
this results in a blockfor of 5. If the pending replica is then also down (as
can happen in a case where removenode is used and not all destination hosts are
up), only 4 of the 5 hosts are available, and quorum queries will never succeed.
h3. UnavailableException not thrown
While debugging this, I spent all my time focusing on this issue as if it was a
timeout. However, Cassandra was doing queries that could never succeed, because
insufficient hosts were available. Throwing an UnavailableException would have
been more helpful. The issue here is caused by
[assureSufficientLiveNodes|https://github.com/apache/cassandra/blob/71cb0616b7710366a8cd364348c864d656dc5542/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L155]
which merely concats the lists of available nodes, and won't consider the
special-case behavior of a pending node that's down.
was:
Looks like nobody found this yet so this may be a ticking time bomb for some...
:(
This happened to me earlier today. On a Cassandra 3.11.4 cluster with three
DCs, one DC had three servers fail due to unexpected external circumstances.
Replication was NTS configured with 2:2:2.
Cassandra dealt with the failures just fine - great! However, they failed in a
way that would make bringing them back impossible, so I tried to remove them
using 'removenode'.
Suddenly, the application started experiencing a large number of QUORUM write
timeouts. My first reflex was to lower the streaming throughput and compaction
throughput, since timeouts indicated some overload was happening. No luck,
though.
I tried a bunch of other things to reroute queries away from the affected
datacenter, like changing the Severity field on the dynamic snitch. Still, no
luck.
After a while I noticed one strange thing: the WriteTimeoutException listed
that five replicas were required, instead of the four you would expect to see
in a 2:2:2 replication configuration. I shrugged it off as some weird
inconsistency that was probably because of the use of batches.
Skip ahead a bit, I decided to let the streams run again and just wait the
issue out, since nothing I did was working, and maybe just letting the streams
finish would resolve this overload. Magically, as soon as the streams finished,
the errors stopped.
----
There are two issues here, both in AbstractWriteResponseHandler.java.
h3. Cassandra sometimes waits for too many replicas on writes
In
[totalBlockFor|https://github.com/apache/cassandra/blob/71cb0616b7710366a8cd364348c864d656dc5542/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L124]
Cassandra will *always* include pending nodes in `blockfor`. In the case of a
quorum query on a 2:2:2 replication factor, with two replicas in one DC down,
this results in a blockfor of 5. If the pending replica is then also down (as
can happen in a case where removenode is used and not all destination hosts are
up), only 4 of the 5 hosts are available, and quorum queries will never succeed.
h3. UnavailableException not thrown
While debugging this, I spent all my time focusing on this issue as if it was a
timeout. However, Cassandra was doing queries that could never succeed, because
insufficient hosts were available. Throwing an UnavailableException would have
been more helpful. The issue here is caused by
[assureSufficientLiveNodes|https://github.com/apache/cassandra/blob/71cb0616b7710366a8cd364348c864d656dc5542/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L155]
which merely concats the lists of available nodes, and won't consider the
special-case behavior of a pending node that's down.
> removenode can cause QUORUM write queries to fail
> -------------------------------------------------
>
> Key: CASSANDRA-15243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15243
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination
> Reporter: Tom van der Woerdt
> Priority: Normal
>
> Looks like nobody found this yet so this may be a ticking time bomb for
> some... :(
> This happened to me earlier today. On a Cassandra 3.11.4 cluster with three
> DCs, one DC had three servers fail due to unexpected external circumstances.
> Replication was NTS configured with 2:2:2.
> Cassandra dealt with the failures just fine - great! However, the nodes
> failed in a way that would make bringing them back impossible, so I tried to
> remove them using 'removenode'.
> Suddenly, the application started experiencing a large number of QUORUM write
> timeouts. My first reflex was to lower the streaming throughput and
> compaction throughput, since timeouts indicated some overload was happening.
> No luck, though.
> I tried a bunch of other things to reroute queries away from the affected
> datacenter, like changing the Severity field on the dynamic snitch. Still, no
> luck.
> After a while I noticed one strange thing: the WriteTimeoutException listed
> that five replicas were required, instead of the four you would expect to see
> in a 2:2:2 replication configuration. I shrugged it off as some weird
> inconsistency that was probably because of the use of batches.
> Skip ahead a bit, I decided to let the streams run again and just wait the
> issue out, since nothing I did was working, and maybe just letting the
> streams finish would resolve this overload. Magically, as soon as the streams
> finished, the errors stopped.
> ----
> There are two issues here, both in AbstractWriteResponseHandler.java.
> h3. Cassandra sometimes waits for too many replicas on writes
> In
> [totalBlockFor|https://github.com/apache/cassandra/blob/71cb0616b7710366a8cd364348c864d656dc5542/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L124]
> Cassandra will *always* include pending nodes in `blockfor`. In the case of
> a quorum query on a 2:2:2 replication factor, with two replicas in one DC
> down, this results in a blockfor of 5. If the pending replica is then also
> down (as can happen in a case where removenode is used and not all
> destination hosts are up), only 4 of the 5 hosts are available, and quorum
> queries will never succeed.
> h3. UnavailableException not thrown
> While debugging this, I spent all my time focusing on this issue as if it was
> a timeout. However, Cassandra was doing queries that could never succeed,
> because insufficient hosts were available. Throwing an UnavailableException
> would have been more helpful. The issue here is caused by
> [assureSufficientLiveNodes|https://github.com/apache/cassandra/blob/71cb0616b7710366a8cd364348c864d656dc5542/src/java/org/apache/cassandra/service/AbstractWriteResponseHandler.java#L155]
> which merely concats the lists of available nodes, and won't consider the
> special-case behavior of a pending node that's down.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]