[ https://issues.apache.org/jira/browse/CASSANDRA-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-6345: -------------------------------------- Attachment: 6345-v2.txt I see, with vnodes we have enough ranges that we can have a thundering herd even if each range only clones once. v2 attached with the approach you described originally. > Endpoint cache invalidation causes CPU spike (on vnode rings?) > -------------------------------------------------------------- > > Key: CASSANDRA-6345 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6345 > Project: Cassandra > Issue Type: Bug > Environment: 30 nodes total, 2 DCs > Cassandra 1.2.11 > vnodes enabled (256 per node) > Reporter: Rick Branson > Fix For: 1.2.12, 2.0.3 > > Attachments: 6345-v2.txt, 6345.txt > > > We've observed that events which cause invalidation of the endpoint cache > (update keyspace, add/remove nodes, etc) in AbstractReplicationStrategy > result in several seconds of thundering herd behavior on the entire cluster. > A thread dump shows over a hundred threads (I stopped counting at that point) > with a backtrace like this: > at java.net.Inet4Address.getAddress(Inet4Address.java:288) > at > org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:106) > at > org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:103) > at java.util.TreeMap.getEntryUsingComparator(TreeMap.java:351) > at java.util.TreeMap.getEntry(TreeMap.java:322) > at java.util.TreeMap.get(TreeMap.java:255) > at > com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:200) > at > com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:117) > at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:74) > at > com.google.common.collect.AbstractMultimap.putAll(AbstractMultimap.java:273) > at com.google.common.collect.TreeMultimap.putAll(TreeMultimap.java:74) > at > org.apache.cassandra.utils.SortedBiMultiValMap.create(SortedBiMultiValMap.java:60) > at > org.apache.cassandra.locator.TokenMetadata.cloneOnlyTokenMap(TokenMetadata.java:598) > at > org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:104) > at > org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2671) > at > org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:375) > It looks like there's a large amount of cost in the > TokenMetadata.cloneOnlyTokenMap that > AbstractReplicationStrategy.getNaturalEndpoints is calling each time there is > a cache miss for an endpoint. It seems as if this would only impact clusters > with large numbers of tokens, so it's probably a vnodes-only issue. > Proposal: In AbstractReplicationStrategy.getNaturalEndpoints(), cache the > cloned TokenMetadata instance returned by TokenMetadata.cloneOnlyTokenMap(), > wrapping it with a lock to prevent stampedes, and clearing it in > clearEndpointCache(). Thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)