[ 
https://issues.apache.org/jira/browse/STORM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996983#comment-14996983
 ] 

Robert Joseph Evans commented on STORM-1190:
--------------------------------------------

OK I am very disappointed with modern day Operating Systems.  I ran the 
following code on a mac book pro.  Using a 1ms sleep and 200 threads was using 
up 1/2 of the CPU.  Going to 300 threads was more or less a DDOS on the box.  
This is very similar to what we are doing with the batching code.  Each 
disruptor queue has a dedicated Timer thread that sleeps for 1ms and then tries 
to flush anything in the batch.  Each bolt/spout instance has 2 disruptor 
queues so having 100 bolt/spout instances on a single box will result in 50% of 
the CPU, in this case, going to sleeping.  I'll see what I can do to make storm 
a not use quite so many threads when it does not need to.
{code}
public class Test extends Thread {
  final long _expectedEnd;
  final long _sleepTime;

  public Test(long ee, long st) {
    _expectedEnd = ee;
    _sleepTime = st;
  }

  public void run() {
    try {
      while (System.currentTimeMillis() < _expectedEnd) {
        Thread.sleep(_sleepTime);
      }
    } catch (Exception e) {
      throw new RuntimeException(e);
    }
  }

  public static void main(String [] args) throws Exception {
    long sleepTime = 1;
    if (args.length > 0) {
      sleepTime = Long.valueOf(args[0]);
    }
    long totalTimeSec = 100;
    if (args.length > 1) {
      totalTimeSec = Long.valueOf(args[1]);
    }
    int totalThreads = 10;
    if (args.length > 2) {
      totalThreads = Integer.valueOf(args[2]);
    }

    long totalTimeMs = totalTimeSec * 1000;
    long expectedEnd = System.currentTimeMillis() + totalTimeMs;
    int ret = -1;
    try {
      Test [] tests = new Test[totalThreads];
      for (int i = 0; i < totalThreads; i++) {
        tests[i] = new Test(expectedEnd, sleepTime);
        tests[i].start();
      }
      for (int i = 0; i < totalThreads; i++) {
        tests[i].join();
      }
      ret = 0;
    } finally {
      System.exit(ret);
    }
  }
}
{code}

> System load spikes in recent snapshot
> -------------------------------------
>
>                 Key: STORM-1190
>                 URL: https://issues.apache.org/jira/browse/STORM-1190
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.11.0
>         Environment: 10x (CoreOS stable (766.4.0) / k8s 1.0.1 / docker 
> running on Azure VMs)
>            Reporter: Michael Schonfeld
>            Priority: Critical
>         Attachments: Screenshot 2015-11-08 22.17.57.png, Screenshot 
> 2015-11-08 22.18.06.png
>
>
> We've been running Storm's snapshots on our production cluster for a little 
> while now (that back pressure support really helped us), and we've noticed a 
> sudden spike in system load when going from 
> commit@ba1250993d10ffc523c9f5464371fbeb406d216f to the current latest 
> commit@c12e28c829fcfabc0a3a775fb9714968b7e3e349. Both versions were running 
> the exact same topologies, and there was no significant change in workload. 
> Not exactly sure how to even begin to debug this, so we ended up just rolling 
> back. Thoughts?
> Stats screenshots attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to