[
https://issues.apache.org/jira/browse/CASSANDRA-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832783#comment-13832783
]
Rick Branson commented on CASSANDRA-6345:
-----------------------------------------
Thanks for taking the time to explain the consistency story. It makes perfect
sense.
My defensiveness comment suggested bumping the version number (this is
practically free) each time the TM write lock is released, which would be in
addition to the existing invalidations. You're probably a much better gauge on
the usefulness of this, so up to you.
Really nice that the v5 patch is so compact. Two minor comments: the
endpointsLock declaration is still in there, and not to be all nitpicky but
there are two typos in the comments ("wo we keep" and "clone got invalidted").
> Endpoint cache invalidation causes CPU spike (on vnode rings?)
> --------------------------------------------------------------
>
> Key: CASSANDRA-6345
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6345
> Project: Cassandra
> Issue Type: Bug
> Environment: 30 nodes total, 2 DCs
> Cassandra 1.2.11
> vnodes enabled (256 per node)
> Reporter: Rick Branson
> Assignee: Jonathan Ellis
> Fix For: 1.2.13
>
> Attachments: 6345-rbranson-v2.txt, 6345-rbranson.txt, 6345-v2.txt,
> 6345-v3.txt, 6345-v4.txt, 6345-v5.txt, 6345.txt,
> half-way-thru-6345-rbranson-patch-applied.png
>
>
> We've observed that events which cause invalidation of the endpoint cache
> (update keyspace, add/remove nodes, etc) in AbstractReplicationStrategy
> result in several seconds of thundering herd behavior on the entire cluster.
> A thread dump shows over a hundred threads (I stopped counting at that point)
> with a backtrace like this:
> at java.net.Inet4Address.getAddress(Inet4Address.java:288)
> at
> org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:106)
> at
> org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:103)
> at java.util.TreeMap.getEntryUsingComparator(TreeMap.java:351)
> at java.util.TreeMap.getEntry(TreeMap.java:322)
> at java.util.TreeMap.get(TreeMap.java:255)
> at
> com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:200)
> at
> com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:117)
> at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:74)
> at
> com.google.common.collect.AbstractMultimap.putAll(AbstractMultimap.java:273)
> at com.google.common.collect.TreeMultimap.putAll(TreeMultimap.java:74)
> at
> org.apache.cassandra.utils.SortedBiMultiValMap.create(SortedBiMultiValMap.java:60)
> at
> org.apache.cassandra.locator.TokenMetadata.cloneOnlyTokenMap(TokenMetadata.java:598)
> at
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:104)
> at
> org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2671)
> at
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:375)
> It looks like there's a large amount of cost in the
> TokenMetadata.cloneOnlyTokenMap that
> AbstractReplicationStrategy.getNaturalEndpoints is calling each time there is
> a cache miss for an endpoint. It seems as if this would only impact clusters
> with large numbers of tokens, so it's probably a vnodes-only issue.
> Proposal: In AbstractReplicationStrategy.getNaturalEndpoints(), cache the
> cloned TokenMetadata instance returned by TokenMetadata.cloneOnlyTokenMap(),
> wrapping it with a lock to prevent stampedes, and clearing it in
> clearEndpointCache(). Thoughts?
--
This message was sent by Atlassian JIRA
(v6.1#6144)