zlzhang0122 commented on a change in pull request #16847:
URL: https://github.com/apache/flink/pull/16847#discussion_r694559238
##########
File path:
flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManagerDriver.java
##########
@@ -689,4 +690,54 @@ public void onStopContainerError(ContainerId containerId,
Throwable throwable) {
throwable);
}
}
+
+ public String getConatainerCompletedCause(ContainerStatus containerStatus)
{
+ String completeContainerMessage = containerStatus.getDiagnostics();
+ switch (containerStatus.getExitStatus()) {
+ case ContainerExitStatus.SUCCESS:
+ log.debug(
+ "Executor for container {} exited because of a YARN
event (e.g., "
+ + "preemption) and not because of an error in
the running job. Diagnostics: {}",
+ containerStatus.getContainerId().toString(),
+ completeContainerMessage);
+ break;
+ case ContainerExitStatus.PREEMPTED:
+ completeContainerMessage =
+ String.format(
+ "Container %s was preempted by yarn.
Diagnostics: %s",
+ containerStatus.getContainerId().toString(),
+ completeContainerMessage);
+ break;
+ case ContainerExitStatus.INVALID:
+ completeContainerMessage =
+ String.format(
+ "Container %s was invalid. Diagnostics: %s",
+ containerStatus.getContainerId().toString(),
+ completeContainerMessage);
+ break;
+ case ContainerExitStatus.ABORTED:
+ completeContainerMessage =
+ String.format(
+ "Container %s killed by YARN for being
released by the application or being 'lost' due to node failures etc.
Diagnostics: %s",
Review comment:
IMO container being released by the app or being 'lost' due to node
failures is the reason why it has been killed by YARN.YARN can't kill it for no
reason, so I think maybe we can change it to 'because'?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]