[jira] [Commented] (CASSANDRA-6992) Bootstrap on vnodes clusters can cause stampeding/storm behavior

Paulo Motta (JIRA) Fri, 24 Jul 2015 17:22:37 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641284#comment-14641284
 ]


Paulo Motta commented on CASSANDRA-6992:
----------------------------------------

Tried reproducing this on my SSD laptop on a 3-node ccm cluster (10GB each), 
but the CPU/IO capacity is quickly exhausted, so there's an impact in 
load/latency but during the whole bootstrapping/streaming process, not only 
during the preparation phase. The problem will definitely be more apparent with 
large overloaded clusters and heterogeneous workloads, and even more with hard 
disks.

I haven't identified major changes in this section of the code (except for the 
new streaming protocol) since 1.2, so I think we can assume the limitation is 
still present on 2.1+. and move on with implementation.

I think a a simple algorithm that waits until a streaming session is 
established before starting the session with the next node should be sufficient 
to prevent storms during bootstrap. We could also provide two tuning properties:
bootstrap_staggering_concurrency: 1 #number of nodes to establish bootstrap 
streaming session in parallel
bootstrap_staggering_interval_seconds: 60 #wait time before establishing a 
session with the next node

Any thoughts? Which version should we aim for? [~jbellis]

> Bootstrap on vnodes clusters can cause stampeding/storm behavior
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-6992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6992
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: Various vnodes-enabled clusters in EC2, m1.xlarge and 
> hi1.4xlarge, ~3000-8000 tokens.
>            Reporter: Rick Branson
>            Assignee: Paulo Motta
>            Priority: Minor
>
> Assuming this is an issue with vnodes clusters because 
> SSTableReader#getPositionsForRanges is more expensive to compute with 256x 
> the ranges, but could be wrong. On even well-provisioned hosts, this can 
> cause a severe spike in network throughput & CPU utilization from a storm of 
> flushes, which impacts long-tail times pretty badly. On weaker hosts (like 
> m1.xlarge with ~500GB of data), it can result in minutes of churn while the 
> node gets through StreamOut#createPendingFiles. This *might* be better in 
> 2.0, but it's probably still reproducible because the bootstrapping node 
> sends out all of it's streaming requests at once. 
> I'm thinking that this could be staggered at the bootstrapping node to avoid 
> the simultaneous spike across the whole cluster. Not sure on how to stagger 
> it besides something very naive like one-at-a-time with a pause. Maybe this 
> should also be throttled in StreamOut#createPendingFiles on the out-streaming 
> host? Any thoughts?
> From the stack dump of one of our weaker nodes that was struggling for a few 
> minutes just starting the StreamOut:
> "MiscStage:1" daemon prio=10 tid=0x000000000292f000 nid=0x688 runnable 
> [0x00007f7b03df6000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361)
>         at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
>         at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:187)
>         at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:125)
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:889)
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:790)
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:730)
>         at 
> org.apache.cassandra.streaming.StreamOut.createPendingFiles(StreamOut.java:172)
>         at 
> org.apache.cassandra.streaming.StreamOut.transferSSTables(StreamOut.java:157)
>         at 
> org.apache.cassandra.streaming.StreamOut.transferRanges(StreamOut.java:148)
>         at 
> org.apache.cassandra.streaming.StreamOut.transferRanges(StreamOut.java:116)
>         at 
> org.apache.cassandra.streaming.StreamRequestVerbHandler.doVerb(StreamRequestVerbHandler.java:44)
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6992) Bootstrap on vnodes clusters can cause stampeding/storm behavior

Reply via email to