[ 
https://issues.apache.org/jira/browse/SAMZA-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hai Lu updated SAMZA-2248:
--------------------------
    Fix Version/s: 1.3

> Fix AM bookkeeping on receiving dead container notifications
> ------------------------------------------------------------
>
>                 Key: SAMZA-2248
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2248
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Xinyu Liu
>            Assignee: Xinyu Liu
>            Priority: Major
>             Fix For: 1.3
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Issue tldr:
> 1. AM gets extra containers from the RM which it saves for later use
> 2. When the container that we saved in step1 dies, the AM on receiving this 
> callback does nothing about it.
> 3. Later, when we are looking for a container to use - we pick up the dead 
> container that we saved and did not clean up (step 1&2) and launch a 
> container. 
> 4. Now, if this launched container ever dies - the RM will never notify the 
> AM about it since it see's it as a duplicate (step 2)
> 5. Job is left without the container rescheduled and will need to be 
> restarted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to