[ 
https://issues.apache.org/jira/browse/STORM-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636456#comment-14636456
 ] 

ASF GitHub Bot commented on STORM-956:
--------------------------------------

GitHub user chuanlei opened a pull request:

    https://github.com/apache/storm/pull/647

    (STORM-956) When the execute() or nextTuple() hang on external resources, 
stop the Worker's heartbeat

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chuanlei/storm 
feature-stop-worker-heartbeat-when-executor-threads-hang

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/647.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #647
    
----
commit c0d1c4ef6ae0d1e144f5af85174d68d5a93eb06a
Author: chuanlei <[email protected]>
Date:   2015-07-22T07:37:28Z

    stop worker heartbeat, when the executor threads hang-on

----


> When the execute() or nextTuple() hang on external resources, stop the 
> Worker's heartbeat
> -----------------------------------------------------------------------------------------
>
>                 Key: STORM-956
>                 URL: https://issues.apache.org/jira/browse/STORM-956
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: Chuanlei Ni
>            Assignee: Chuanlei Ni
>            Priority: Minor
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> Sometimes the work threads produced by mk-threads in executor.clj hang on 
> external resources or other unknown reasons. This makes the workers stop 
> processing the tuples.  I think it is better to kill this worker to resolve 
> the "hang". I plan to :
> 1. like `setup-ticks`, send a system-tick to receive-queue
> 2. the tuple-action-fn deal with this system-tick and remember the time that 
> processes this tuple in the executor-data
> 3. when worker do local heartbeat, check the time the executor writes to 
> executor-data. If the time is long from current (for example, 3 minutes), the 
> worker does not do the heartbeat.  So the supervisor could deal with this 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to