[ 
https://issues.apache.org/jira/browse/STORM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998657#comment-14998657
 ] 

ASF GitHub Bot commented on STORM-1190:
---------------------------------------

Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/870#issuecomment-155434976
  
    @danielschonfeld Yes we needs something like that.  We are using a single 
timer thread for all of the system metrics to be able to do something similar, 
but there is no possibility of them blocking.  It is all atomic memory 
operations when the timer goes off so having a single thread shared between 
them is not a big deal.
    
    With the disruptor queue if the queue is full the thread will block trying 
to flush the messages.  This could be a very long period of time.  This is why 
I went with the ScheduledThreadPoolExecutor.  The problem here is that it has 
no way to spin up new threads under load or tear them down when idle.  I could 
spend some time and build my own, but this seems to solve a lot, but not all of 
the load problem.
    
    I am going to try and spend some time to create something that can do all 
of that, but it is not something I can put together in a few hours and expect 
it to work.


> System load spikes in recent snapshot
> -------------------------------------
>
>                 Key: STORM-1190
>                 URL: https://issues.apache.org/jira/browse/STORM-1190
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.11.0
>         Environment: 10x (CoreOS stable (766.4.0) / k8s 1.0.1 / docker 
> running on Azure VMs)
>            Reporter: Michael Schonfeld
>            Priority: Critical
>         Attachments: Screenshot 2015-11-08 22.17.57.png, Screenshot 
> 2015-11-08 22.18.06.png
>
>
> We've been running Storm's snapshots on our production cluster for a little 
> while now (that back pressure support really helped us), and we've noticed a 
> sudden spike in system load when going from 
> commit@ba1250993d10ffc523c9f5464371fbeb406d216f to the current latest 
> commit@c12e28c829fcfabc0a3a775fb9714968b7e3e349. Both versions were running 
> the exact same topologies, and there was no significant change in workload. 
> Not exactly sure how to even begin to debug this, so we ended up just rolling 
> back. Thoughts?
> Stats screenshots attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to