[ https://issues.apache.org/jira/browse/CASSANDRA-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823274#comment-13823274 ]
Rick Branson commented on CASSANDRA-6345: ----------------------------------------- Unfortunately both of the patches suffer from a deadlock, since the invalidation and fill are wrapped up in TokenMetadata's locks. T1 acquires cache read lock T2 acquires TokenMetadata write lock T1 acquires cache write lock on miss T2 is blocked on cache write lock trying to invalidate T1 is blocked on TokenMetadata read lock trying to cloneOnlyTokenMap to fill the cache Trying to work on a fix. > Endpoint cache invalidation causes CPU spike (on vnode rings?) > -------------------------------------------------------------- > > Key: CASSANDRA-6345 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6345 > Project: Cassandra > Issue Type: Bug > Environment: 30 nodes total, 2 DCs > Cassandra 1.2.11 > vnodes enabled (256 per node) > Reporter: Rick Branson > Assignee: Jonathan Ellis > Fix For: 1.2.12, 2.0.3 > > Attachments: 6345-rbranson.txt, 6345-v2.txt, 6345.txt, > half-way-thru-6345-rbranson-patch-applied.png > > > We've observed that events which cause invalidation of the endpoint cache > (update keyspace, add/remove nodes, etc) in AbstractReplicationStrategy > result in several seconds of thundering herd behavior on the entire cluster. > A thread dump shows over a hundred threads (I stopped counting at that point) > with a backtrace like this: > at java.net.Inet4Address.getAddress(Inet4Address.java:288) > at > org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:106) > at > org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:103) > at java.util.TreeMap.getEntryUsingComparator(TreeMap.java:351) > at java.util.TreeMap.getEntry(TreeMap.java:322) > at java.util.TreeMap.get(TreeMap.java:255) > at > com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:200) > at > com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:117) > at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:74) > at > com.google.common.collect.AbstractMultimap.putAll(AbstractMultimap.java:273) > at com.google.common.collect.TreeMultimap.putAll(TreeMultimap.java:74) > at > org.apache.cassandra.utils.SortedBiMultiValMap.create(SortedBiMultiValMap.java:60) > at > org.apache.cassandra.locator.TokenMetadata.cloneOnlyTokenMap(TokenMetadata.java:598) > at > org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:104) > at > org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2671) > at > org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:375) > It looks like there's a large amount of cost in the > TokenMetadata.cloneOnlyTokenMap that > AbstractReplicationStrategy.getNaturalEndpoints is calling each time there is > a cache miss for an endpoint. It seems as if this would only impact clusters > with large numbers of tokens, so it's probably a vnodes-only issue. > Proposal: In AbstractReplicationStrategy.getNaturalEndpoints(), cache the > cloned TokenMetadata instance returned by TokenMetadata.cloneOnlyTokenMap(), > wrapping it with a lock to prevent stampedes, and clearing it in > clearEndpointCache(). Thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)