TaoYang526 commented on a change in pull request #11248: [FLINK-16299] Release
containers recovered from previous attempt in w…
URL: https://github.com/apache/flink/pull/11248#discussion_r386823957
##########
File path:
flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java
##########
@@ -464,7 +472,15 @@ public void onContainerStarted(ContainerId containerId,
Map<String, ByteBuffer>
@Override
public void onContainerStatusReceived(ContainerId containerId,
ContainerStatus containerStatus) {
- // We are not interested in getting container status
+ // We fetch the status of the container from the previous
attempts.
+ if (containerStatus.getState() == ContainerState.NEW) {
Review comment:
> Are you suggesting that calling NMClientAsync.getContainerStatusAsync on a
NEW container might result in onGetContainerStatusError on some Hadoop versions
while onContainerStatusReceived on other versions?
No, they are coexisting in Hadoop, onContainerStatusReceived is for
containers that already started by AM via calling NMClient#startContainers
while onGetContainerStatusError is for containers that haven't been been
started by AM or other causes like NM lost.
> If that is the case, I think we can have a common method handling
releasing the container and removing it from the worker node map
Yes, a common method is necessary.
> One more question, how do we now whether a container is NEW or there's
some other problems in onGetContainerStatusError?
There maybe several causes for this handling, such as container is not found
on NM or NM can't be connected, but they can be considered as a same problem:
this container may be not useable for now since we can't get the status
successfully, I think we can just handle this as above no matter what the real
cause is.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services