[
https://issues.apache.org/jira/browse/STORM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998848#comment-14998848
]
ASF GitHub Bot commented on STORM-1190:
---------------------------------------
Github user revans2 commented on the pull request:
https://github.com/apache/storm/pull/870#issuecomment-155471398
@danielschonfeld I don't know 100% what the cause of the issue is, but I
suspect that it is having lots of threads trying to sleep very frequently.
```storm jar
./examples/storm-starter/storm-starter-topologies-0.11.0-SNAPSHOT.jar
storm.starter.ThroughputVsLatency 100 1 5```
is the test that I have been using. It is really just word count, but with
some latency and system utilizations statistics added in. The problem is that
with batching the CPU utilization under low load is much higher then before the
batching patch went in. 24886eec5c45e7fd30cac804fd080360f17599a0 before
batching, a0f3412a8268e75f87a95421a53f6bc4b6af9842 with batching, and then the
current patch are tests I ran. all of the measurements are in ms measured over
a 30 second period.
| version | user | sys |
|---|---|---|
| no-batch | 4,320 | 4,789 |
| batch | 7,565 | 17,483 |
| this | 8,101 | 10,238 |
So the patch shifted some load from the kernel to user space, and dropped
the overall CPU utilization, but it is still much higher than before.
> System load spikes in recent snapshot
> -------------------------------------
>
> Key: STORM-1190
> URL: https://issues.apache.org/jira/browse/STORM-1190
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 0.11.0
> Environment: 10x (CoreOS stable (766.4.0) / k8s 1.0.1 / docker
> running on Azure VMs)
> Reporter: Michael Schonfeld
> Priority: Critical
> Attachments: Screenshot 2015-11-08 22.17.57.png, Screenshot
> 2015-11-08 22.18.06.png
>
>
> We've been running Storm's snapshots on our production cluster for a little
> while now (that back pressure support really helped us), and we've noticed a
> sudden spike in system load when going from
> commit@ba1250993d10ffc523c9f5464371fbeb406d216f to the current latest
> commit@c12e28c829fcfabc0a3a775fb9714968b7e3e349. Both versions were running
> the exact same topologies, and there was no significant change in workload.
> Not exactly sure how to even begin to debug this, so we ended up just rolling
> back. Thoughts?
> Stats screenshots attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)