[ 
https://issues.apache.org/jira/browse/CASSANDRA-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849074#comment-17849074
 ] 

Michael Semb Wever edited comment on CASSANDRA-18120 at 5/23/24 6:38 PM:
-------------------------------------------------------------------------

[~shunsaker], do you want to share the patch you're willing to upstream ?  That 
patch would have had a lot of production exposure already, so it would be my 
preference.  [~maximc], are you ok if we focus on Shayne's patch ? I know 
you've done a lot of work already, and it sucks when you've completed a patch 
and it was the first patch offered.  Given your expertise now, and not letting 
it go to waste, it would be very valuable to have you as a reviewer (and 
tester).

bq. Michael Semb Wever and Maxim Chanturiay provide strong arguments against 
Dynamic snitch. 

This is not related to logged batch writes, and today the dynamic snitch does 
nothing for it anyway.  The advice to disable the dynamic snitch has been a 
long standing recommendation from The Last Pickle, aimed at competent Cassandra 
operators that have healthy and performant clusters, and solid enough 
monitoring and alerting in place to otherwise detect and deal with a slow node. 
 The dynamic snitch comes with its own overhead, and on healthy performant 
clusters can't keep up, so offers very little value.  (Don't look past those 
caveats though!)

If you have a problem with slow nodes, and don't have a way to deal with it, 
then the dynamic snitch is a good option, and adding the same ability to the 
batchlog makes sense.  


was (Author: michaelsembwever):
[~shunsaker], do you want to share the patch you're willing to upstream ?  This 
patch has had a lot of production exposure already, so it has my preference.  
[~maximc], are you ok if we focus on Shayne's patch ? I know you've done a lot 
of work already, and it sucks when you've completed a patch and it was the 
first patch offered.  Given your expertise now, and not letting it go to waste, 
it would be very valuable to have you as a reviewer (and tester).

bq. Michael Semb Wever and Maxim Chanturiay provide strong arguments against 
Dynamic snitch. 

This is not related to logged batch writes, and today the dynamic snitch does 
nothing for it anyway.  The advice to disable the dynamic snitch has been a 
long standing recommendation from The Last Pickle, aimed at competent Cassandra 
operators that have healthy and performant clusters, and solid enough 
monitoring and alerting in place to otherwise detect and deal with a slow node. 
 The dynamic snitch comes with its own overhead, and on healthy performant 
clusters can't keep up, so offers very little value.  (Don't look past those 
caveats though!)

If you have a problem with slow nodes, and don't have a way to deal with it, 
then the dynamic snitch is a good option, and adding the same ability to the 
batchlog makes sense.  

> Single slow node dramatically reduces cluster logged batch write throughput 
> regardless of CL
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18120
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18120
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Dan Sarisky
>            Assignee: Maxim Chanturiay
>            Priority: Normal
>
> We issue writes to Cassandra as logged batches(RF=3, Consistency levels=TWO, 
> QUORUM, or LOCAL_QUORUM)
>  
> On clusters of any size - a single extremely slow node causes a ~90% loss of 
> cluster-wide throughput using batched writes.  We can replicate this in the 
> lab via CPU or disk throttling.  I observe this in 3.11, 4.0, and 4.1.
>  
> It appears the mechanism in play is:
> Those logged batches are immediately written to two replica nodes and the 
> actual mutations aren't processed until those two nodes acknowledge the batch 
> statements.  Those replica nodes are selected randomly from all nodes in the 
> local data center currently up in gossip.  If a single node is slow, but 
> still thought to be up in gossip, this eventually causes every other node to 
> have all of its MutationStages to be waiting while the slow replica accepts 
> batch writes.
>  
> The code in play appears to be:
> See
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L245].
>   In the method filterBatchlogEndpoints() there is a
> Collections.shuffle() to order the endpoints and a
> FailureDetector.isEndpointAlive() to test if the endpoint is acceptable.
>  
> This behavior causes Cassandra to move from a multi-node fault tolerant 
> system toa collection of single points of failure.
>  
> We try to take administrator actions to kill off the extremely slow nodes, 
> but it would be great to have some notion of "what node is a bad choice" when 
> writing log batches to replica nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to