TaoYang526 commented on a change in pull request #11248: [FLINK-16299] Release
containers recovered from previous attempt in w…
URL: https://github.com/apache/flink/pull/11248#discussion_r386791133
##########
File path:
flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java
##########
@@ -464,7 +472,15 @@ public void onContainerStarted(ContainerId containerId,
Map<String, ByteBuffer>
@Override
public void onContainerStatusReceived(ContainerId containerId,
ContainerStatus containerStatus) {
- // We are not interested in getting container status
+ // We fetch the status of the container from the previous
attempts.
+ if (containerStatus.getState() == ContainerState.NEW) {
Review comment:
ContainerStatus#getState() may only returns RUNNING(means it's on starting
or started) or COMPLETE(means it has finished) in most hadoop versions, rare
versions may contains NEW or SCHEDULED. So that I think this condition can be
declared as not RUNNING here, and we should add a condition like `if
(containerStatus.getState() != ContainerState.COMPLETE)` for the
resourceManagerClient#releaseAssignedContainer calling since there's no
necessary to do that.
We also should handle this in onGetContainerStatusError method for
containers that haven't been started via calling NMClient#startContainer yet by
the last AM.
Last suggestion is to consider consistence when internal state may be
updated inside, it can be handled by calling runAsyc(...).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services