Jeremiah Jordan created CASSANDRA-11738:
-------------------------------------------

             Summary: Re-think the use of Severity in the DynamicEndpointSnitch 
calculation
                 Key: CASSANDRA-11738
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11738
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jeremiah Jordan


CASSANDRA-11737 was opened to allow completely disabling the use of severity in 
the DynamicEndpointSnitch calculation, but that is a pretty big hammer.  There 
is probably something we can do to better use the score.

The issue seems to be that severity is given equal weight with latency in the 
current code, also that severity is only based on disk io.  If you have a node 
that is CPU bound on something (say catching up on LCS compactions because of 
bootstrap/repair/replace) the IO wait can be low, but the latency to the node 
is high.

Some ideas I had are:
1. Allowing a yaml parameter to tune how much impact the severity score has in 
the calculation.
2. Taking CPU load into account as well as IO Wait (this would probably help in 
the cases I have seen things go sideways)
3. Move the -D from CASSANDRA-11737 to being a yaml level setting
4. Go back to just relying on Latency and get rid of severity all together.  
Now that we have rapid read protection, maybe just using latency is enough, as 
it can help where the predictive nature of IO wait would have been useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to