tillrohrmann commented on a change in pull request #7356: 
[FLINK-10868][flink-yarn] Enforce maximum failed TMs in YarnResourceManager
URL: https://github.com/apache/flink/pull/7356#discussion_r250562404
 
 

 ##########
 File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java
 ##########
 @@ -172,6 +176,12 @@ public YarnResourceManager(
                                        "than YARN's expiry interval ({}). The 
application is likely to be killed by YARN.",
                                        yarnHeartbeatIntervalMS, 
yarnExpiryIntervalMS);
                }
+
+               final int numInitialTM = Integer.parseInt(env.getOrDefault(
+                       YarnConfigKeys.ENV_TM_COUNT, 
DEFAULT_INITIAL_NUM_TASK_MANAGER));
+               this.maximumAllowedTaskManagerFailureCount =
+                       
flinkConfig.getInteger(YarnConfigOptions.MAX_FAILED_CONTAINERS.key(), 
numInitialTM);
 
 Review comment:
   Setting `maximumAllowedTaskManagerFailureCount` here with a 
`YarnConfigOptions` indicates that the failed container functionality the way 
it is implemented right now should actually go into the `YarnResourceManager`. 
If we want to add this functionality for all `ResourceManagers`, then we should 
introduce a generic option and also a call back 
`ResourceManager::notifyTaskManagerFailed` to do the counting in the 
`ResourceManager`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to