[
https://issues.apache.org/jira/browse/CASSANDRA-19958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885421#comment-17885421
]
Jaydeepkumar Chovatia commented on CASSANDRA-19958:
---------------------------------------------------
I kept it configurable to maintain backward compatibility in our production. We
were doing this for the first time, so I just wanted to keep it configurable in
case of any issues. Now, I have already tested this in our production, and it
has been working fine without any issues.
Anyway, there is no strong reason; let me remove the configuration part. I will
update the PR shortly.
> Local Hints are stepping on local mutations
> -------------------------------------------
>
> Key: CASSANDRA-19958
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19958
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Local Write-Read Paths
> Reporter: Jaydeepkumar Chovatia
> Priority: Normal
> Attachments: image-2024-09-26-15-28-20-435.png
>
>
> Cassandra uses the same queue (Stage.MUTATION) to process local mutations as
> well as local hints writing. CASSANDRA-19534 has enhanced and added timeouts
> for local mutations, but local hint writing does not honor that timeout by
> design as it honors a different timeout, i.e. _max_hint_window_in_ms_
>
> *The Problem*
> Let's understand the problem by having five nodes Cassandra cluster N1, N2,
> N3, N4, N5 with the following configuration:
> * concurrent_writes{_}:{_}10
> * native_transport_timeout: 5s
> * write_request_timeout_in_ms: 2000 //2 seconds
> .
> +StorageProxy.java snippet...+
>
> !image-2024-09-26-15-28-20-435.png|width=600,height=200!
>
> Let's assume N4 and N5 are slow flapping or down. Assume N1 receives a flurry
> of mutations, so this is what happens on N1:
> # Line no 1542: Append 100 hints to the Stage.Mutation queue
> # Line no 1547: Append 100 local mutations to the Stage.Mutation queue
> Stage.Mutation queue on N1 would look as follows:
> {code:java}
> hint1,hint2,hint3,....hint100,mutation1,mutaiton2,....mutation100 {code}
> * Assume hints runnable takes 1 second, then it will take 10 seconds to
> process 100 hints, and only after that will local mutation be processed.
>
> So, in production, it would look like N1 is inactive for almost 10 seconds as
> it is just writing hints locally and not participating in any Quorum, etc.
>
> The problem becomes really huge if, let's say, the load is high, and if hints
> pile up to 1M, then N1 will choke. The only solution at this time is to
> involve an operator that will restart N1 to drain all the piled-up hints from
> the Stage.Mutation queue.
>
> The reason above problem happens is because local hint writing and local
> mutation are both using the same Queue, i.e., Stage.Mutation.
> Local mutation writing is in the hot path. However, a slight local hint
> writing delay does not create a big trouble.
>
> *Reproducible steps*
> # Pull the latest 4.1.x release
> # Create a 5-node cluster
> # Set the following configuration
> {code:java}
> native_transport_timeout: 10s
> write_request_timeout_in_ms: 2000
> enforce_native_deadline_for_hints: true{code}
> # Inject 1s of latency inside the following API in _StorageProxy.java_ on
> all five nodes
> #
> {code:java}
> private static void performLocally(Stage stage, Replica localReplica, final
> Runnable runnable, final RequestCallback<?> handler, Object description,
> Dispatcher.RequestTime requestTime)
> {
> stage.maybeExecuteImmediately(new LocalMutationRunnable(localReplica,
> requestTime)
> {
> public void runMayThrow()
> {
> try
> {
> Thread.sleep(1000); // Inject latency here
> runnable.run();
> handler.onResponse(null);
> }
> catch (Exception ex)
> {
> if (!(ex instanceof WriteTimeoutException))
> logger.error("Failed to apply mutation locally : ", ex);
> handler.onFailure(FBUtilities.getBroadcastAddressAndPort(),
> RequestFailureReason.forException(ex));
> }
> }
> @Override
> public String description()
> {
> // description is an Object and toString() called so we do not
> have to evaluate the Mutation.toString()
> // unless expliclitly checked
> return description.toString();
> }
> @Override
> protected Verb verb()
> {
> return Verb.MUTATION_REQ;
> }
> });
> } {code}
> # Run write-only stress for 1 hour or so
> # You will see Stage.Mutation queue will pile up to >1 million in size
> # Stop the load
> # Stage.Mutation will not be cleared immediately, and you cannot perform new
> writes. Basically, at this time Cassandra cluster has become inoperable from
> new mutations point-of-view. Only read will be served
>
> *Solution*
> The solution is to segregate the local mutation queue and local hint writing
> queue to address the problem above. Here is the PR:
> [https://github.com/apache/cassandra/pull/3580]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]