[ https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-6465: -------------------------------------- Priority: Minor (was: Major) Fix Version/s: 2.0.4 Assignee: Tyler Hobbs Labels: gossip (was: ) > DES scores fluctuate too much for cache pinning > ----------------------------------------------- > > Key: CASSANDRA-6465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6465 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 1.2.11, 2 DC cluster > Reporter: Chris Burroughs > Assignee: Tyler Hobbs > Priority: Minor > Labels: gossip > Fix For: 2.0.4 > > Attachments: des-score-graph.png, des.sample.15min.csv, get-scores.py > > > To quote the conf: > {noformat} > # if set greater than zero and read_repair_chance is < 1.0, this will allow > # 'pinning' of replicas to hosts in order to increase cache capacity. > # The badness threshold will control how much worse the pinned host has to be > # before the dynamic snitch will prefer other replicas over it. This is > # expressed as a double which represents a percentage. Thus, a value of > # 0.2 means Cassandra would continue to prefer the static snitch values > # until the pinned host was 20% worse than the fastest. > dynamic_snitch_badness_threshold: 0.1 > {noformat} > An assumption of this feature is that scores will vary by less than > dynamic_snitch_badness_threshold during normal operations. Attached is the > result of polling a node for the scores of 6 different endpoints at 1 Hz for > 15 minutes. The endpoints to sample were chosen with `nodetool getendpoints` > for row that is known to get reads. The node was acting as a coordinator for > a few hundred req/second, so it should have sufficient data to work with. > Other traces on a second cluster have produced similar results. > * The scores vary by far more than I would expect, as show by the difficulty > of seeing anything useful in that graph. > * The difference between the best and next-best score is usually > 10% > (default dynamic_snitch_badness_threshold). > Neither ClientRequest nor ColumFamily metrics showed wild changes during the > data gathering period. > Attachments: > * jython script cobbled together to gather the data (based on work on the > mailing list from Maki Watanabe a while back) > * csv of DES scores for 6 endpoints, polled about once a second > * Attempt at making a graph -- This message was sent by Atlassian JIRA (v6.1.4#6159)