Rick Branson created CASSANDRA-6345:
---------------------------------------

             Summary: Endpoint cache invalidation causes CPU spike (on vnode 
rings?)
                 Key: CASSANDRA-6345
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6345
             Project: Cassandra
          Issue Type: Bug
         Environment: 30 nodes total, 2 DCs
Cassandra 1.2.11
vnodes enabled (256 per node)
            Reporter: Rick Branson


We've observed that events which cause invalidation of the endpoint cache 
(update keyspace, add/remove nodes, etc) in AbstractReplicationStrategy result 
in several seconds of thundering herd behavior on the entire cluster. 

A thread dump shows over a hundred threads (I stopped counting at that point) 
with a backtrace like this:

        at java.net.Inet4Address.getAddress(Inet4Address.java:288)
        at 
org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:106)
        at 
org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:103)
        at java.util.TreeMap.getEntryUsingComparator(TreeMap.java:351)
        at java.util.TreeMap.getEntry(TreeMap.java:322)
        at java.util.TreeMap.get(TreeMap.java:255)
        at 
com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:200)
        at 
com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:117)
        at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:74)
        at 
com.google.common.collect.AbstractMultimap.putAll(AbstractMultimap.java:273)
        at com.google.common.collect.TreeMultimap.putAll(TreeMultimap.java:74)
        at 
org.apache.cassandra.utils.SortedBiMultiValMap.create(SortedBiMultiValMap.java:60)
        at 
org.apache.cassandra.locator.TokenMetadata.cloneOnlyTokenMap(TokenMetadata.java:598)
        at 
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:104)
        at 
org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2671)
        at 
org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:375)

It looks like there's a large amount of cost in the 
TokenMetadata.cloneOnlyTokenMap that 
AbstractReplicationStrategy.getNaturalEndpoints is calling each time there is a 
cache miss for an endpoint. It seems as if this would only impact clusters with 
large numbers of tokens, so it's probably a vnodes-only issue.

Proposal: In AbstractReplicationStrategy.getNaturalEndpoints(), cache the 
cloned TokenMetadata instance returned by TokenMetadata.cloneOnlyTokenMap(), 
wrapping it with a lock to prevent stampedes, and clearing it in 
clearEndpointCache(). Thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to