[ https://issues.apache.org/jira/browse/APEXCORE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891488#comment-15891488 ]
Sandesh commented on APEXCORE-426: ---------------------------------- Adding an answer to the question asked by [~PramodSSImmaneni] Q: Just for completeness can you explain on the JIRA what happens if there were plan changes that were saved but before stram could affect them it got shutdown. Will the new instance make those changes. A: When a Stram recovers, It will use the last checkpointed plan, if the running containers are any different from the expected plan, following things will happen 1. Containers unknown to Stram are killed/rejected 2. Containers which are not responding will be rescheduled. > Support work preserving AM recovery > ----------------------------------- > > Key: APEXCORE-426 > URL: https://issues.apache.org/jira/browse/APEXCORE-426 > Project: Apache Apex Core > Issue Type: Improvement > Reporter: Thomas Weise > Assignee: Sandesh > Labels: apex-hadoop-version > > On app master failure, the streaming containers should continue running. > As of 2.2, YARN will automatically terminate all containers and the > replacement app master will relaunch them. Once we move to a newer minimum > Hadoop version, we should leverage work preserving restart. > The mechanism in Apex containers to locate the new master process are already > in place. > > Test Cases: > 1. Kill the app-master - only app-master container id should change, all the > other containers id should remain same. > 2. Kill the app-master and few other containers, make sure that killed > containers are recovered. -- This message was sent by Atlassian JIRA (v6.3.15#6346)