[ https://issues.apache.org/jira/browse/CASSANDRA-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832783#comment-13832783 ]
Rick Branson commented on CASSANDRA-6345: ----------------------------------------- Thanks for taking the time to explain the consistency story. It makes perfect sense. My defensiveness comment suggested bumping the version number (this is practically free) each time the TM write lock is released, which would be in addition to the existing invalidations. You're probably a much better gauge on the usefulness of this, so up to you. Really nice that the v5 patch is so compact. Two minor comments: the endpointsLock declaration is still in there, and not to be all nitpicky but there are two typos in the comments ("wo we keep" and "clone got invalidted"). > Endpoint cache invalidation causes CPU spike (on vnode rings?) > -------------------------------------------------------------- > > Key: CASSANDRA-6345 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6345 > Project: Cassandra > Issue Type: Bug > Environment: 30 nodes total, 2 DCs > Cassandra 1.2.11 > vnodes enabled (256 per node) > Reporter: Rick Branson > Assignee: Jonathan Ellis > Fix For: 1.2.13 > > Attachments: 6345-rbranson-v2.txt, 6345-rbranson.txt, 6345-v2.txt, > 6345-v3.txt, 6345-v4.txt, 6345-v5.txt, 6345.txt, > half-way-thru-6345-rbranson-patch-applied.png > > > We've observed that events which cause invalidation of the endpoint cache > (update keyspace, add/remove nodes, etc) in AbstractReplicationStrategy > result in several seconds of thundering herd behavior on the entire cluster. > A thread dump shows over a hundred threads (I stopped counting at that point) > with a backtrace like this: > at java.net.Inet4Address.getAddress(Inet4Address.java:288) > at > org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:106) > at > org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:103) > at java.util.TreeMap.getEntryUsingComparator(TreeMap.java:351) > at java.util.TreeMap.getEntry(TreeMap.java:322) > at java.util.TreeMap.get(TreeMap.java:255) > at > com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:200) > at > com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:117) > at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:74) > at > com.google.common.collect.AbstractMultimap.putAll(AbstractMultimap.java:273) > at com.google.common.collect.TreeMultimap.putAll(TreeMultimap.java:74) > at > org.apache.cassandra.utils.SortedBiMultiValMap.create(SortedBiMultiValMap.java:60) > at > org.apache.cassandra.locator.TokenMetadata.cloneOnlyTokenMap(TokenMetadata.java:598) > at > org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:104) > at > org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2671) > at > org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:375) > It looks like there's a large amount of cost in the > TokenMetadata.cloneOnlyTokenMap that > AbstractReplicationStrategy.getNaturalEndpoints is calling each time there is > a cache miss for an endpoint. It seems as if this would only impact clusters > with large numbers of tokens, so it's probably a vnodes-only issue. > Proposal: In AbstractReplicationStrategy.getNaturalEndpoints(), cache the > cloned TokenMetadata instance returned by TokenMetadata.cloneOnlyTokenMap(), > wrapping it with a lock to prevent stampedes, and clearing it in > clearEndpointCache(). Thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)