[ 
https://issues.apache.org/jira/browse/MESOS-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433168#comment-15433168
 ] 

Yan Xu commented on MESOS-5763:
-------------------------------

[~megha.sharma] contributed a test for this.

{noformat:title=}
commit a064505e411fe78a257e9b336a888f1eeddaa949
Author: Megha Sharma <[email protected]>
Date:   Mon Aug 22 14:51:07 2016 -0700

    Added test to simulate slow/unresponsive fetch.
    
    Added test to simulate the scenario of slow/unresponsive HDFS leading
    to executor register timeout and verify that slave gets notified of the
    failure.
    
    Review: https://reviews.apache.org/r/50000/
{noformat}

> Task stuck in fetching is not cleaned up after 
> --executor_registration_timeout.
> -------------------------------------------------------------------------------
>
>                 Key: MESOS-5763
>                 URL: https://issues.apache.org/jira/browse/MESOS-5763
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.28.0, 1.0.0, 0.29.0
>            Reporter: Yan Xu
>            Assignee: Yan Xu
>            Priority: Blocker
>             Fix For: 0.28.3, 1.0.0, 0.27.4
>
>
> When the fetching process hangs forever due to reasons such as HDFS issues, 
> Mesos containerizer would attempt to destroy the container and kill the 
> executor after {{--executor_registration_timeout}}. However this reliably 
> fails for us: the executor would be killed by the launcher destroy and the 
> container would be destroyed but the agent would never find out that the 
> executor is terminated thus leaving the task in the STAGING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to