Jaydeepkumar Chovatia created CASSANDRA-19958:
-------------------------------------------------
Summary: Hints are stepping on online mutations
Key: CASSANDRA-19958
URL: https://issues.apache.org/jira/browse/CASSANDRA-19958
Project: Cassandra
Issue Type: Bug
Components: Legacy/Local Write-Read Paths
Reporter: Jaydeepkumar Chovatia
Attachments: image-2024-09-26-15-28-20-435.png
Cassandra uses the same queue (Stage.MUTATION) to process local mutations as
well as local hints writing. CASSANDRA-19534 has enhanced and added timeouts
for local mutations, but local hint writing does not honor that timeout by
design as it honors a different timeout, i.e. _max_hint_window_in_ms_
*The Problem*
Let's understand the problem by having five nodes Cassandra cluster N1, N2, N3,
N4, N5 with the following configuration:
* concurrent_writes{_}:{_}10
*
native_transport_timeout: 5s
* write_request_timeout_in_ms: 2000 //2 seconds
.
+StorageProxy.java snippet...+
!image-2024-09-26-15-28-20-435.png!
Let's assume N4 and N5 are slow flapping or down. Assume N1 receives a flurry
of mutations, so this is what happens on N1:
# Line no 1542: Append 100 hints to the Stage.Mutation queue
# Line no 1547: Append 100 local mutations to the Stage.Mutation queue
Stage.Mutation queue on N1 would look as follows:
{code:java}
hint1,hint2,hint3,....hint100,mutation1,mutaiton2,....mutation100 {code}
* Assume hints runnable takes 1 second, then it will take 10 seconds to
process 100 hints, and only after that will local mutation be processed.
So, in production, it would look like N1 is inactive for almost 10 seconds as
it is just writing hints locally and not participating in any Quorum, etc.
The problem becomes really huge if, let's say, the load is high, and if hints
pile up to 1M, then N1 will choke. The only solution at this time is to involve
an operator that will restart N1 to drain all the piled-up hints from the
Stage.Mutation queue.
The reason above problem happens is because local hint writing and local
mutation are both using the same Queue, i.e., Stage.Mutation.
Local mutation writing is in the hot path. However, a slight local hint writing
delay does not create a big trouble.
*Reproducible steps*
# Pull the latest 4.1.x release
# Create a 5-node cluster
# Set the following configuration
{code:java}
native_transport_timeout: 10s
write_request_timeout_in_ms: 2000
enforce_native_deadline_for_hints: true{code}
# Inject 1s of latency inside the following API in _StorageProxy.java_ on all
five nodes
#
{code:java}
private static void performLocally(Stage stage, Replica localReplica, final
Runnable runnable, final RequestCallback<?> handler, Object description,
Dispatcher.RequestTime requestTime)
{
stage.maybeExecuteImmediately(new LocalMutationRunnable(localReplica,
requestTime)
{
public void runMayThrow()
{
try
{
Thread.sleep(1000); // Inject latency here
runnable.run();
handler.onResponse(null);
}
catch (Exception ex)
{
if (!(ex instanceof WriteTimeoutException))
logger.error("Failed to apply mutation locally : ", ex);
handler.onFailure(FBUtilities.getBroadcastAddressAndPort(),
RequestFailureReason.forException(ex));
}
}
@Override
public String description()
{
// description is an Object and toString() called so we do not have
to evaluate the Mutation.toString()
// unless expliclitly checked
return description.toString();
}
@Override
protected Verb verb()
{
return Verb.MUTATION_REQ;
}
});
} {code}
# Run write-only stress for 1 hour or so
# You will see Stage.Mutation queue will pile up to >1 million in size
# Stop the load
# Stage.Mutation will not be cleared immediately, and you cannot perform new
writes. Basically, at this time Cassandra cluster has become inoperable from
new mutations point-of-view. Only read will be served
*Solution*
The solution is to segregate the local mutation queue and local hint writing
queue to address the problem above.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]