[ 
https://issues.apache.org/jira/browse/APEXCORE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandesh reassigned APEXCORE-743:
--------------------------------

    Assignee: Sandesh

> Killed container is shown as running
> ------------------------------------
>
>                 Key: APEXCORE-743
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-743
>             Project: Apache Apex Core
>          Issue Type: Bug
>            Reporter: Sandesh
>            Assignee: Sandesh
>
> Here is the behavior
> 1. Container Heartbeat timeout happened
> 2. AppMaster sends the request to kill the container
> 3. Container is killed
> 4.  AppMaster state is not updated and no new container was allocated
> After analyzing the code here is the possible reason
> 1. Send the kill request to NM
> 2. Container killed by NM, but NM callback doesn't happen. RecoverContainer 
> is called in NM callback, which in this case is not called.
> 3. AppMaster state is not updated
> Possible fix.
> Have a timeout for NM callback, so that if NM doesn't respond that the 
> container is killed in time, call the RecoverContainer. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to