[ 
https://issues.apache.org/jira/browse/STORM-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349447#comment-16349447
 ] 

Jungtaek Lim commented on STORM-2853:
-------------------------------------

[~vorin]

Storm 2.0.0 was started for porting Clojure to Java, which was one of major 
change for merging JStorm. While merging JStorm is not happened, we had many 
voices (including me) who are in favor of get rid of Clojure in various 
reasons, so we went ahead and ported most of things to Java in master 
branch.(Regarding merging JStorm, we can restart merging works at any time 
since they donate the code to ASF, but huge divergence between twos are not 
easy to cover.)

We also have several improvements in Storm 2.0.0, and what you see is one of 
them: we broke storm-core down into multiple modules in upcoming Storm 2.0.0, 
"storm-client" which is related to client topology side (worker) interfaces and 
implementations (will have much less dependencies than current), "storm-server" 
which is related to daemon side interfaces and implementations, 
"storm-webserver" which is related to HTTP service and UI. We still keep 
"storm-core" since we didn't port back some tests yet.

So if you are really brave to test Storm 2.0.0 SNAPSHOT out, please check out 
master branch and build your own dist. Most of the time you would want to check 
out 1.x-branch to test out the change of latest 1.x version line.

> Deactivated topologies cause high cpu utilization
> -------------------------------------------------
>
>                 Key: STORM-2853
>                 URL: https://issues.apache.org/jira/browse/STORM-2853
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.1.0
>            Reporter: Stuart
>            Assignee: Jungtaek Lim
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: exclamation.zip
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The issue is there is high cpu usage for deactivated apache storm topologies. 
>  I can reliably re-create the issue using the steps below but I haven't 
> identified the exact cause or a solution yet.
> The environment is a storm cluster on which 1 topology is running (The 
> topology is extremely simple, I used the exclamation example).  It is 
> INACTIVE.  Initially there is normal CPU usage.  However, when I kill all 
> topology JVM processes on all supervisors and let Storm restart them again, I 
> find that some time later (~9 hours) the CPU usage per JVM process rises to 
> nearly 100%.  I have tested an ACTIVE topology and this does not happen with 
> it.  I have also tested more than one topology and observe the same results 
> when they're in the INACTIVE state.
> ***Steps to re-create:***
>  1. Run 1 topology on an Apache Storm cluster
>  2. Deactivate it
>  3. Kill **all** topology JVM processes on all supervisors (Storm will 
> restart them)
>  4. Observe the CPU usage on Supervisors rise to nearly 100% for all 
> **INACTIVE** topology JVM processes.
> ***Environment***
> Apache Storm 1.1.0 running on 3 VMs (1 nimbus and 2 supervisors).
> Cluster Summary:
>  - Supervisors: 2 
>  - Used Slots: 2 
>  - Available Slots: 38 
>  - Total Slots: 40
>  - Executors: 50 
>  - Tasks: 50
> the topology has 2 workers and 50 executors/tasks (threads).
> ***Investigation so far:***
> Apart from being able to reliably re-create the issue, I have identified, for 
> the affected topology JVM process, the threads using the most CPU.  There are 
> 102 threads total in the process, 97 blocked, 5 IN_NATIVE.  The threads using 
> the most CPU are identical and there are 23 of them (all in BLOCKED state):
>     Thread 28558: (state = BLOCKED)
>      - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; 
> information may be imprecise)
>      - java.util.concurrent.locks.LockSupport.parkNanos(long) @bci=11, 
> line=338 (Compiled frame)
>      - com.lmax.disruptor.MultiProducerSequencer.next(int) @bci=82, line=136 
> (Compiled frame)
>      - com.lmax.disruptor.RingBuffer.next(int) @bci=5, line=260 (Interpreted 
> frame)
>      - 
> org.apache.storm.utils.DisruptorQueue.publishDirect(java.util.ArrayList, 
> boolean) @bci=18, line=517 (Interpreted frame)
>      - 
> org.apache.storm.utils.DisruptorQueue.access$1000(org.apache.storm.utils.DisruptorQueue,
>  java.util.ArrayList, boolean) @bci=3, line=61 (Interpreted frame)
>      - 
> org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher.flush(boolean) 
> @bci=50, line=280 (Interpreted frame)
>      - org.apache.storm.utils.DisruptorQueue$Flusher.run() @bci=55, line=303 
> (Interpreted frame)
>      - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>      - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled 
> frame)
>      - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Compiled frame)
>      - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>      - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> I identified this thread by using `jstack` to get a thread dump for the 
> process:
>  
>     jstack -F <pid> > jstack<pid>.txt
> and `top` to identify the threads within the process using the most CPU:
>     top -H -p <pid> 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to