[ https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nate McCall updated CASSANDRA-6908: ----------------------------------- Reproduced In: 2.1.9, 2.1.7 (was: 2.1.9) > Dynamic endpoint snitch destabilizes cluster under heavy load > ------------------------------------------------------------- > > Key: CASSANDRA-6908 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6908 > Project: Cassandra > Issue Type: Improvement > Components: Config, Core > Reporter: Bartłomiej Romański > Attachments: as-dynamic-snitch-disabled.png > > > We observe that with dynamic snitch disabled our cluster is much more stable > than with dynamic snitch enabled. > We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB > RAM, 2x480 GB SSD). We mostly do reads (about 300k/s). > We use Astyanax on client side with TOKEN_AWARE option enabled. It > automatically direct read queries to one of the nodes responsible the given > token. > In that case with dynamic snitch disabled Cassandra always handles read > locally. With dynamic snitch enabled Cassandra very often decides to proxy > the read to some other node. This causes much higher CPU usage and produces > much more garbage what results in more often GC pauses (young generation > fills up quicker). By "much higher" and "much more" I mean 1.5-2x. > I'm aware that higher dynamic_snitch_badness_threshold value should solve > that issue. The default value is 0.1. I've looked at scores exposed in JMX > and the problem is that our values seemed to be completely random. They are > between usually 0.5 and 2.0, but changes randomly every time I hit refresh. > Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something > like that, but the result will be similar to simply disabling the dynamic > switch at all (that's what we done). > I've tried to understand what's the logic behind these scores and I'm not > sure if I get the idea... > It's a sum (without any multipliers) of two components: > - ratio of recent given node latency to recent average node latency > - something called 'severity', what, if I analyzed the code correctly, is a > result of BackgroundActivityMonitor.getIOWait() - it's a ratio of "iowait" > CPU time to the whole CPU time as reported in /proc/stats (the ratio is > multiplied by 100) > In our case the second value is something around 0-2% but varies quite > heavily every second. > What's the idea behind simply adding this two values without any multipliers > (e.g the second one is in percentage while the first one is not)? Are we sure > this is the best possible way of calculating the final score? > Is there a way too force Cassandra to use (much) longer samples? In our case > we probably need that to get stable values. The 'severity' is calculated for > each second. The mean latency is calculated based on some magic, hardcoded > values (ALPHA = 0.75, WINDOW_SIZE = 100). > Am I right that there's no way to tune that without hacking the code? > I'm aware that there's dynamic_snitch_update_interval_in_ms property in the > config file, but that only determines how often the scores are recalculated > not how long samples are taken. Is that correct? > To sum up, It would be really nice to have more control over dynamic snitch > behavior or at least have the official option to disable it described in the > default config file (it took me some time to discover that we can just > disable it instead of hacking with dynamic_snitch_badness_threshold=1000). > Currently for some scenarios (like ours - optimized cluster, token aware > client, heavy load) it causes more harm than good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)