[ https://issues.apache.org/jira/browse/CASSANDRA-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Yeschenko updated CASSANDRA-6488: ----------------------------------------- Attachment: 6488-fix.txt Separates TM.cloneOnlyTokenMap() and TM.cachedOnlyTokenMap() and only switched SP.getBatchlogEndpoints() and ARS.getNaturalEndpoints() to use the cached version. They aren't the only methods that *don't* mutate the returned metadata, but going through the rest of the usages and optimizing those can wait. Also fixes a regression from 6435 in TM.cachedOnlyTokenMap(). > Batchlog writes consume unnecessarily large amounts of CPU on vnodes clusters > ----------------------------------------------------------------------------- > > Key: CASSANDRA-6488 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6488 > Project: Cassandra > Issue Type: Bug > Reporter: Rick Branson > Assignee: Rick Branson > Fix For: 1.2.13, 2.0.4 > > Attachments: 6488-fix.txt, 6488-rbranson-patch.txt, 6488-v2.txt, > 6488-v3.txt, graph (21).png > > > The cloneTokenOnlyMap call in StorageProxy.getBatchlogEndpoints causes > enormous amounts of CPU to be consumed on clusters with many vnodes. I > created a patch to cache this data as a workaround and deployed it to a > production cluster with 15,000 tokens. CPU consumption drop to 1/5th. This > highlights the overall issues with cloneOnlyTokenMap() calls on vnodes > clusters. I'm including the maybe-not-the-best-quality workaround patch to > use as a reference, but cloneOnlyTokenMap is a systemic issue and every place > it's called should probably be investigated. -- This message was sent by Atlassian JIRA (v6.1.4#6159)