[
https://issues.apache.org/jira/browse/SAMZA-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hai Lu updated SAMZA-2248:
--------------------------
Fix Version/s: 1.3
> Fix AM bookkeeping on receiving dead container notifications
> ------------------------------------------------------------
>
> Key: SAMZA-2248
> URL: https://issues.apache.org/jira/browse/SAMZA-2248
> Project: Samza
> Issue Type: Bug
> Reporter: Xinyu Liu
> Assignee: Xinyu Liu
> Priority: Major
> Fix For: 1.3
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> Issue tldr:
> 1. AM gets extra containers from the RM which it saves for later use
> 2. When the container that we saved in step1 dies, the AM on receiving this
> callback does nothing about it.
> 3. Later, when we are looking for a container to use - we pick up the dead
> container that we saved and did not clean up (step 1&2) and launch a
> container.
> 4. Now, if this launched container ever dies - the RM will never notify the
> AM about it since it see's it as a duplicate (step 2)
> 5. Job is left without the container rescheduled and will need to be
> restarted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)