[
https://issues.apache.org/jira/browse/CASSANDRA-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis updated CASSANDRA-6992:
--------------------------------------
Assignee: Paulo Motta (was: Yuki Morishita)
> Bootstrap on vnodes clusters can cause stampeding/storm behavior
> ----------------------------------------------------------------
>
> Key: CASSANDRA-6992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6992
> Project: Cassandra
> Issue Type: Improvement
> Environment: Various vnodes-enabled clusters in EC2, m1.xlarge and
> hi1.4xlarge, ~3000-8000 tokens.
> Reporter: Rick Branson
> Assignee: Paulo Motta
> Priority: Minor
>
> Assuming this is an issue with vnodes clusters because
> SSTableReader#getPositionsForRanges is more expensive to compute with 256x
> the ranges, but could be wrong. On even well-provisioned hosts, this can
> cause a severe spike in network throughput & CPU utilization from a storm of
> flushes, which impacts long-tail times pretty badly. On weaker hosts (like
> m1.xlarge with ~500GB of data), it can result in minutes of churn while the
> node gets through StreamOut#createPendingFiles. This *might* be better in
> 2.0, but it's probably still reproducible because the bootstrapping node
> sends out all of it's streaming requests at once.
> I'm thinking that this could be staggered at the bootstrapping node to avoid
> the simultaneous spike across the whole cluster. Not sure on how to stagger
> it besides something very naive like one-at-a-time with a pause. Maybe this
> should also be throttled in StreamOut#createPendingFiles on the out-streaming
> host? Any thoughts?
> From the stack dump of one of our weaker nodes that was struggling for a few
> minutes just starting the StreamOut:
> "MiscStage:1" daemon prio=10 tid=0x000000000292f000 nid=0x688 runnable
> [0x00007f7b03df6000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361)
> at
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
> at
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:187)
> at
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:125)
> at
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:889)
> at
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:790)
> at
> org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:730)
> at
> org.apache.cassandra.streaming.StreamOut.createPendingFiles(StreamOut.java:172)
> at
> org.apache.cassandra.streaming.StreamOut.transferSSTables(StreamOut.java:157)
> at
> org.apache.cassandra.streaming.StreamOut.transferRanges(StreamOut.java:148)
> at
> org.apache.cassandra.streaming.StreamOut.transferRanges(StreamOut.java:116)
> at
> org.apache.cassandra.streaming.StreamRequestVerbHandler.doVerb(StreamRequestVerbHandler.java:44)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)