metaswirl commented on a change in pull request #18169:
URL: https://github.com/apache/flink/pull/18169#discussion_r783922862
##########
File path:
flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManagerDriver.java
##########
@@ -684,6 +694,7 @@ public void onGetContainerStatusError(ContainerId
containerId, Throwable throwab
@Override
public void onStopContainerError(ContainerId containerId, Throwable
throwable) {
+ trackerOfReleasedResources.arriveAndDeregister();
Review comment:
Not sure that I follow. If we start the container release process we
basically count up (register). When the container release process completes
successfully we count down (deregister). The same happens, when the release
process fails (onStopContainerError). If we start a new release process, we
would still count up at the start and count down at the end.
Are you saying that scenarios exist where we count up, but never count down
or vice versa? This can only happen, if the callback is for some reason never
called. Possibly due to a failure in the YARN clients. Even in this (unlikely?)
case, the containers will be killed shortly afterwards anyway. (That is, if the
shutdown procedure is initated over YARN's kill application command.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]