[ https://issues.apache.org/jira/browse/FLINK-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944675#comment-14944675 ]
Till Rohrmann commented on FLINK-2790: -------------------------------------- What did the logs say? I try to reproduce it. On Mon, Oct 5, 2015 at 5:50 PM, ASF GitHub Bot (JIRA) <j...@apache.org> > Add high availability support for Yarn > -------------------------------------- > > Key: FLINK-2790 > URL: https://issues.apache.org/jira/browse/FLINK-2790 > Project: Flink > Issue Type: Sub-task > Components: JobManager, TaskManager > Reporter: Till Rohrmann > Fix For: 0.10 > > > Add master high availability support for Yarn. The idea is to let Yarn > restart a failed application master in a new container. For that, we set the > number of application retries to something greater than 1. > From version 2.4.0 onwards, it is possible to reuse already started > containers for the TaskManagers, thus, avoiding unnecessary restart delays. > From version 2.6.0 onwards, it is possible to specify an interval in which > the number of application attempts have to be exceeded in order to fail the > job. This will prevent long running jobs from eventually depleting all > available application attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)