[ 
https://issues.apache.org/jira/browse/CASSANDRA-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6992:
--------------------------------------
    Assignee: Paulo Motta  (was: Yuki Morishita)

> Bootstrap on vnodes clusters can cause stampeding/storm behavior
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-6992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6992
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: Various vnodes-enabled clusters in EC2, m1.xlarge and 
> hi1.4xlarge, ~3000-8000 tokens.
>            Reporter: Rick Branson
>            Assignee: Paulo Motta
>            Priority: Minor
>
> Assuming this is an issue with vnodes clusters because 
> SSTableReader#getPositionsForRanges is more expensive to compute with 256x 
> the ranges, but could be wrong. On even well-provisioned hosts, this can 
> cause a severe spike in network throughput & CPU utilization from a storm of 
> flushes, which impacts long-tail times pretty badly. On weaker hosts (like 
> m1.xlarge with ~500GB of data), it can result in minutes of churn while the 
> node gets through StreamOut#createPendingFiles. This *might* be better in 
> 2.0, but it's probably still reproducible because the bootstrapping node 
> sends out all of it's streaming requests at once. 
> I'm thinking that this could be staggered at the bootstrapping node to avoid 
> the simultaneous spike across the whole cluster. Not sure on how to stagger 
> it besides something very naive like one-at-a-time with a pause. Maybe this 
> should also be throttled in StreamOut#createPendingFiles on the out-streaming 
> host? Any thoughts?
> From the stack dump of one of our weaker nodes that was struggling for a few 
> minutes just starting the StreamOut:
> "MiscStage:1" daemon prio=10 tid=0x000000000292f000 nid=0x688 runnable 
> [0x00007f7b03df6000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361)
>         at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
>         at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:187)
>         at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:125)
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:889)
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:790)
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:730)
>         at 
> org.apache.cassandra.streaming.StreamOut.createPendingFiles(StreamOut.java:172)
>         at 
> org.apache.cassandra.streaming.StreamOut.transferSSTables(StreamOut.java:157)
>         at 
> org.apache.cassandra.streaming.StreamOut.transferRanges(StreamOut.java:148)
>         at 
> org.apache.cassandra.streaming.StreamOut.transferRanges(StreamOut.java:116)
>         at 
> org.apache.cassandra.streaming.StreamRequestVerbHandler.doVerb(StreamRequestVerbHandler.java:44)
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to