[
https://issues.apache.org/jira/browse/FLINK-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946612#comment-14946612
]
ASF GitHub Bot commented on FLINK-2790:
---------------------------------------
Github user tillrohrmann commented on the pull request:
https://github.com/apache/flink/pull/1213#issuecomment-146136443
I found the issue. My code did not properly reflected the methods
`setKeepContainersAcrossApplicationAttempts` and
`setAttemptFailuresValidityInterval`. With the fix, already started containers
are retained. Tested it with Yarn 2.7.1.
> Add high availability support for Yarn
> --------------------------------------
>
> Key: FLINK-2790
> URL: https://issues.apache.org/jira/browse/FLINK-2790
> Project: Flink
> Issue Type: Sub-task
> Components: JobManager, TaskManager
> Reporter: Till Rohrmann
> Fix For: 0.10
>
>
> Add master high availability support for Yarn. The idea is to let Yarn
> restart a failed application master in a new container. For that, we set the
> number of application retries to something greater than 1.
> From version 2.4.0 onwards, it is possible to reuse already started
> containers for the TaskManagers, thus, avoiding unnecessary restart delays.
> From version 2.6.0 onwards, it is possible to specify an interval in which
> the number of application attempts have to be exceeded in order to fail the
> job. This will prevent long running jobs from eventually depleting all
> available application attempts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)