[ 
https://issues.apache.org/jira/browse/CASSANDRA-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827319#comment-13827319
 ] 

Jonathan Ellis commented on CASSANDRA-6345:
-------------------------------------------

bq. It seems as if just setting the cache to empty would allow a period of time 
where TokenMetadata write methods had returned but not all threads have seen 
the mutation yet

I'm not 100% sure this is what you're talking about, but I see this problem 
with the existing code (and my v3):

{noformat}
Thread 1                 Thread 2        
getNaturalEndpoints      
cloneOnlyTokenMap        
                         invalidateCachedTokenEndpointValues
endpoints = calculate
cacheEndpoint [based on the now-invalidated token map]
{noformat}

So it doesn't quite work.  We'd need to introduce another AtomicReference on 
the cache, so that invalidate could create a new Map (so it doesn't matter if 
someone updates the old one).  But I think you're right that getting rid of the 
callback approach entirely is better.

> Endpoint cache invalidation causes CPU spike (on vnode rings?)
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-6345
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6345
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: 30 nodes total, 2 DCs
> Cassandra 1.2.11
> vnodes enabled (256 per node)
>            Reporter: Rick Branson
>            Assignee: Jonathan Ellis
>             Fix For: 1.2.12, 2.0.3
>
>         Attachments: 6345-rbranson-v2.txt, 6345-rbranson.txt, 6345-v2.txt, 
> 6345-v3.txt, 6345.txt, half-way-thru-6345-rbranson-patch-applied.png
>
>
> We've observed that events which cause invalidation of the endpoint cache 
> (update keyspace, add/remove nodes, etc) in AbstractReplicationStrategy 
> result in several seconds of thundering herd behavior on the entire cluster. 
> A thread dump shows over a hundred threads (I stopped counting at that point) 
> with a backtrace like this:
>         at java.net.Inet4Address.getAddress(Inet4Address.java:288)
>         at 
> org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:106)
>         at 
> org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:103)
>         at java.util.TreeMap.getEntryUsingComparator(TreeMap.java:351)
>         at java.util.TreeMap.getEntry(TreeMap.java:322)
>         at java.util.TreeMap.get(TreeMap.java:255)
>         at 
> com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:200)
>         at 
> com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:117)
>         at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:74)
>         at 
> com.google.common.collect.AbstractMultimap.putAll(AbstractMultimap.java:273)
>         at com.google.common.collect.TreeMultimap.putAll(TreeMultimap.java:74)
>         at 
> org.apache.cassandra.utils.SortedBiMultiValMap.create(SortedBiMultiValMap.java:60)
>         at 
> org.apache.cassandra.locator.TokenMetadata.cloneOnlyTokenMap(TokenMetadata.java:598)
>         at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:104)
>         at 
> org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2671)
>         at 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:375)
> It looks like there's a large amount of cost in the 
> TokenMetadata.cloneOnlyTokenMap that 
> AbstractReplicationStrategy.getNaturalEndpoints is calling each time there is 
> a cache miss for an endpoint. It seems as if this would only impact clusters 
> with large numbers of tokens, so it's probably a vnodes-only issue.
> Proposal: In AbstractReplicationStrategy.getNaturalEndpoints(), cache the 
> cloned TokenMetadata instance returned by TokenMetadata.cloneOnlyTokenMap(), 
> wrapping it with a lock to prevent stampedes, and clearing it in 
> clearEndpointCache(). Thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to