GitHub user rmetzger opened a pull request:
https://github.com/apache/flink/pull/468
[FLINK-1629][FLINK-1630][FLINK-1547] Rework Flink on YARN
The main change here is a reworked container scheduling logic in the YARN
ApplicationMaster.
This commit is changing:
[FLINK-1629]: users can now "fire and forget" jobs to YARN or YARN sessions
to there. (Detached mode)
[FLINK-1630]: YARN is now reallocating failed YARN containers during the
lifetime of a YARN session.
[FLINK-1547]: Users can now specify if they want the ApplicationMaster (=
the JobManager = the entire YARN session) to restart on failure, and how often.
After the first restart, the session will behave like a detached session. There
is now backup of state between the old and the new AM.
The whole resource negotiation process between the RM and the AM has been
reworked.
Flink is now much more flexible when requesting new containers and also
giving back uneeded containers.
A new test case is testing the container restart. It is also verifying that
the web frontend is proplery started, that the logfile access is possible and
that the configuration values the user specifies when starting the YARN session
are visible in the web frontend.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rmetzger/flink flink-1630-rebased-final
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/468.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #468
----
commit ee02f92f609b2fb300c3e5af9bf75ea0745dff3b
Author: Robert Metzger <[email protected]>
Date: 2015-03-05T14:03:05Z
[FLINK-1629][FLINK-1630][FLINK-1547] Add option to start Flink on YARN in
a detached mode. YARN container reallocation.
This commit is changing:
[FLINK-1629]: users can now "fire and forget" jobs to YARN or YARN
sessions to there. (Detached mode)
[FLINK-1630]: YARN is now reallocating failed YARN containers during
the lifetime of a YARN session.
[FLINK-1547]: Users can now specify if they want the ApplicationMaster
(= the JobManager = the entire YARN session) to restart on failure, and how
often. After the first restart, the session will behave like a detached
session. There is now backup of state between the old and the new AM.
The whole resource negotiation process between the RM and the AM has
been reworked.
Flink is now much more flexible when requesting new containers and also
giving back uneeded containers.
A new test case is testing the container restart. It is also verifying
that the web frontend is proplery started,
that the logfile access is possible and
that the configuration values the user specifies when starting the YARN
session are visible in the web frontend.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---