[
https://issues.apache.org/jira/browse/FLINK-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023176#comment-16023176
]
ASF GitHub Bot commented on FLINK-6708:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/3982
[FLINK-6708] [yarn] Harden FlinkYarnSessionCli to handle
GetClusterStatusResponse exceptions
This PR is based on #3981.
This PR hardens the FlinkYarnSessionCli by handling exceptions which occur
when
retrieving the GetClusterStatusResponse. If no such response is retrieved
and instead
an exception is thrown, the Cli won't fail but retry it the next time.
cc @rmetzger.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink hardenYarnSession
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3982.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3982
----
commit 72ce39a1752cc19669f003b70cc2708852a06ac5
Author: Till Rohrmann <[email protected]>
Date: 2017-05-24T15:59:51Z
[FLINK-6646] [yarn] Let YarnJobManager delete Yarn application files
Before the YarnClusterClient decided when to delete the Yarn application
files.
This is problematic because the client does not know whether a Yarn
application
is being restarted or terminated. Due to this the files where always
deleted. This
prevents Yarn from restarting a failed ApplicationMaster, effectively
thwarting
Flink's HA capabilities.
The PR changes the behaviour such that the YarnJobManager deletes the Yarn
files
if it receives a StopCluster message. That way, we can be sure that the
yarn files
are deleted only iff the cluster is intended to be shut down.
commit 9227539f97e6dbc77c5367b8c555b4ba0b2ad06d
Author: Till Rohrmann <[email protected]>
Date: 2017-05-24T16:26:57Z
[FLINK-6708] [yarn] Harden FlinkYarnSessionCli to handle
GetClusterStatusResponse exceptions
This PR hardens the FlinkYarnSessionCli by handling exceptions which occur
when
retrieving the GetClusterStatusResponse. If no such response is retrieved
and instead
an exception is thrown, the Cli won't fail but retry it the next time.
----
> Don't let the FlinkYarnSessionCli fail if it cannot retrieve the ClusterStatus
> ------------------------------------------------------------------------------
>
> Key: FLINK-6708
> URL: https://issues.apache.org/jira/browse/FLINK-6708
> Project: Flink
> Issue Type: Improvement
> Components: YARN
> Affects Versions: 1.3.0, 1.4.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Minor
>
> The {{FlinkYarnSessionCli}} should not fail if it cannot retrieve the
> {{GetClusterStatusResponse}}. This would harden Flink's Yarn session.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)