[ https://issues.apache.org/jira/browse/APEXCORE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892154#comment-15892154 ]
Pramod Immaneni commented on APEXCORE-426: ------------------------------------------ How long will it take for 2. to happen? HEARTBEAT_TIMEOUT? > Support work preserving AM recovery > ----------------------------------- > > Key: APEXCORE-426 > URL: https://issues.apache.org/jira/browse/APEXCORE-426 > Project: Apache Apex Core > Issue Type: Improvement > Reporter: Thomas Weise > Assignee: Sandesh > Labels: apex-hadoop-version > > On app master failure, the streaming containers should continue running. > As of 2.2, YARN will automatically terminate all containers and the > replacement app master will relaunch them. Once we move to a newer minimum > Hadoop version, we should leverage work preserving restart. > The mechanism in Apex containers to locate the new master process are already > in place. > > Test Cases: > 1. Kill the app-master - only app-master container id should change, all the > other containers id should remain same. > 2. Kill the app-master and few other containers, make sure that killed > containers are recovered. -- This message was sent by Atlassian JIRA (v6.3.15#6346)