[ 
https://issues.apache.org/jira/browse/MESOS-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365181#comment-15365181
 ] 

Yan Xu commented on MESOS-5763:
-------------------------------

Committed everything other than https://reviews.apache.org/r/49726/

Let's add a integration test as a separate JIRA.

{noformat:title=}
commit 13c2020d429d6d000bb37649f2a1be47de5b8f8c
Author: Jiang Yan Xu <xuj...@apple.com>
Date:   Fri Jul 1 18:12:01 2016 -0700

    Made Mesos containerizer error messages more consistent.
    
    We've been using slightly different wordings of the same condition in
    multiple places in Mesos containerizer but they don't provide
    additional information about where this failure is thrown in a long
    continuation chain. Since failures don't capture the location in the
    code we'd better distinguish them in a more meaningful way to assist
    debugging.
    
    Review: https://reviews.apache.org/r/49653

commit 8907b5d5e1f007c4592a0417a4d9f20d7e1f8efd
Author: Jiang Yan Xu <xuj...@apple.com>
Date:   Fri Jul 1 18:11:29 2016 -0700

    Improved Mesos containerizer invariant checking.
    
    One of the reasons for MESOS-5763 is due to the lack invariant
    checking. Mesos containerizer transitions the container state in
    particular ways so when continuation chains could potentially be
    interleaved with other actions we should verify the state transitions.
    
    Review: https://reviews.apache.org/r/49652

commit 48b1bfa6ec5f88ceea327ae3c5345fd4d11442c7
Author: Jiang Yan Xu <xuj...@apple.com>
Date:   Fri Jul 1 15:25:54 2016 -0700

    Improved Mesos containerizer logging and documentation.
    
    Review: https://reviews.apache.org/r/49651

commit dc18dd7a5ec48a184aeb1c5a7c475ecf7691734b
Author: Jiang Yan Xu <xuj...@apple.com>
Date:   Wed Jul 6 13:48:34 2016 -0700

    Fail container launch if it's destroyed during logger->prepare().
    
    Review: https://reviews.apache.org/r/49725

commit 114474c443678997da8f931a41703f1095206421
Author: Jiang Yan Xu <xuj...@apple.com>
Date:   Fri Jul 1 15:27:37 2016 -0700

    Fixed Mesos containerizer to set container FETCHING state.
    
    If the container state is not properly set to FETCHING, Mesos agent
    cannot detect the terminated executor when the fetcher times out.
    
    Review: https://reviews.apache.org/r/49650
{noformat}

> Task stuck in fetching is not cleaned up after 
> --executor_registration_timeout.
> -------------------------------------------------------------------------------
>
>                 Key: MESOS-5763
>                 URL: https://issues.apache.org/jira/browse/MESOS-5763
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.28.0, 1.0.0, 0.29.0
>            Reporter: Yan Xu
>            Assignee: Yan Xu
>            Priority: Blocker
>             Fix For: 0.28.3, 1.0.0, 0.27.4
>
>
> When the fetching process hangs forever due to reasons such as HDFS issues, 
> Mesos containerizer would attempt to destroy the container and kill the 
> executor after {{--executor_registration_timeout}}. However this reliably 
> fails for us: the executor would be killed by the launcher destroy and the 
> container would be destroyed but the agent would never find out that the 
> executor is terminated thus leaving the task in the STAGING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to