[
https://issues.apache.org/jira/browse/FLINK-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024790#comment-16024790
]
ASF GitHub Bot commented on FLINK-6708:
---------------------------------------
Github user tzulitai commented on a diff in the pull request:
https://github.com/apache/flink/pull/3982#discussion_r118497398
--- Diff:
flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java ---
@@ -413,14 +413,18 @@ public static void
runInteractiveCli(YarnClusterClient yarnCluster, boolean read
while (true) {
// ------------------ check if there are
updates by the cluster -----------
- GetClusterStatusResponse status =
yarnCluster.getClusterStatus();
- LOG.debug("Received status message: {}",
status);
+ try {
+ GetClusterStatusResponse status =
yarnCluster.getClusterStatus();
+ LOG.debug("Received status message:
{}", status);
- if (status != null && numTaskmanagers !=
status.numRegisteredTaskManagers()) {
- System.err.println("Number of connected
TaskManagers changed to " +
+ if (status != null && numTaskmanagers
!= status.numRegisteredTaskManagers()) {
+ System.err.println("Number of
connected TaskManagers changed to " +
status.numRegisteredTaskManagers() + ". " +
- "Slots available: " +
status.totalNumberOfSlots());
- numTaskmanagers =
status.numRegisteredTaskManagers();
+ "Slots available: " +
status.totalNumberOfSlots());
+ numTaskmanagers =
status.numRegisteredTaskManagers();
+ }
+ } catch (Exception e) {
+ LOG.warn("Could not retrieve the
current cluster status. Retrying...", e);
--- End diff --
"Skipping" might be a better term here, because we aren't actually retrying
to get the cluster status, just ignoring it for this loop attempt.
> Don't let the FlinkYarnSessionCli fail if it cannot retrieve the ClusterStatus
> ------------------------------------------------------------------------------
>
> Key: FLINK-6708
> URL: https://issues.apache.org/jira/browse/FLINK-6708
> Project: Flink
> Issue Type: Improvement
> Components: YARN
> Affects Versions: 1.3.0, 1.4.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Minor
>
> The {{FlinkYarnSessionCli}} should not fail if it cannot retrieve the
> {{GetClusterStatusResponse}}. This would harden Flink's Yarn session.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)