[
https://issues.apache.org/jira/browse/APEXCORE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tushar Gosavi closed APEXCORE-426.
----------------------------------
Closing after 3.6.0 release
> Support work preserving AM recovery
> -----------------------------------
>
> Key: APEXCORE-426
> URL: https://issues.apache.org/jira/browse/APEXCORE-426
> Project: Apache Apex Core
> Issue Type: Improvement
> Reporter: Thomas Weise
> Assignee: Sandesh
> Labels: apex-hadoop-version
> Fix For: 3.6.0
>
>
> On app master failure, the streaming containers should continue running.
> As of 2.2, YARN will automatically terminate all containers and the
> replacement app master will relaunch them. Once we move to a newer minimum
> Hadoop version, we should leverage work preserving restart.
> The mechanism in Apex containers to locate the new master process are already
> in place.
>
> Test Cases:
> 1. Kill the app-master - only app-master container id should change, all the
> other containers id should remain same.
> 2. Kill the app-master and few other containers, make sure that killed
> containers are recovered.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)