[
https://issues.apache.org/jira/browse/STORM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996983#comment-14996983
]
Robert Joseph Evans commented on STORM-1190:
--------------------------------------------
OK I am very disappointed with modern day Operating Systems. I ran the
following code on a mac book pro. Using a 1ms sleep and 200 threads was using
up 1/2 of the CPU. Going to 300 threads was more or less a DDOS on the box.
This is very similar to what we are doing with the batching code. Each
disruptor queue has a dedicated Timer thread that sleeps for 1ms and then tries
to flush anything in the batch. Each bolt/spout instance has 2 disruptor
queues so having 100 bolt/spout instances on a single box will result in 50% of
the CPU, in this case, going to sleeping. I'll see what I can do to make storm
a not use quite so many threads when it does not need to.
{code}
public class Test extends Thread {
final long _expectedEnd;
final long _sleepTime;
public Test(long ee, long st) {
_expectedEnd = ee;
_sleepTime = st;
}
public void run() {
try {
while (System.currentTimeMillis() < _expectedEnd) {
Thread.sleep(_sleepTime);
}
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public static void main(String [] args) throws Exception {
long sleepTime = 1;
if (args.length > 0) {
sleepTime = Long.valueOf(args[0]);
}
long totalTimeSec = 100;
if (args.length > 1) {
totalTimeSec = Long.valueOf(args[1]);
}
int totalThreads = 10;
if (args.length > 2) {
totalThreads = Integer.valueOf(args[2]);
}
long totalTimeMs = totalTimeSec * 1000;
long expectedEnd = System.currentTimeMillis() + totalTimeMs;
int ret = -1;
try {
Test [] tests = new Test[totalThreads];
for (int i = 0; i < totalThreads; i++) {
tests[i] = new Test(expectedEnd, sleepTime);
tests[i].start();
}
for (int i = 0; i < totalThreads; i++) {
tests[i].join();
}
ret = 0;
} finally {
System.exit(ret);
}
}
}
{code}
> System load spikes in recent snapshot
> -------------------------------------
>
> Key: STORM-1190
> URL: https://issues.apache.org/jira/browse/STORM-1190
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 0.11.0
> Environment: 10x (CoreOS stable (766.4.0) / k8s 1.0.1 / docker
> running on Azure VMs)
> Reporter: Michael Schonfeld
> Priority: Critical
> Attachments: Screenshot 2015-11-08 22.17.57.png, Screenshot
> 2015-11-08 22.18.06.png
>
>
> We've been running Storm's snapshots on our production cluster for a little
> while now (that back pressure support really helped us), and we've noticed a
> sudden spike in system load when going from
> commit@ba1250993d10ffc523c9f5464371fbeb406d216f to the current latest
> commit@c12e28c829fcfabc0a3a775fb9714968b7e3e349. Both versions were running
> the exact same topologies, and there was no significant change in workload.
> Not exactly sure how to even begin to debug this, so we ended up just rolling
> back. Thoughts?
> Stats screenshots attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)