[
https://issues.apache.org/jira/browse/MAPREDUCE-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604860#comment-17604860
]
ASF GitHub Bot commented on MAPREDUCE-7407:
-------------------------------------------
ashutoshcipher commented on code in PR #4779:
URL: https://github.com/apache/hadoop/pull/4779#discussion_r971090194
##########
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java:
##########
@@ -385,6 +385,20 @@ public void run() {
// TODO: Do it only once per NodeManager.
ContainerId containerID = event.getContainerID();
+ // If the container failed to launch earlier (due to dead node for
example),
+ // it has been marked as FAILED and removed from containers during
+ // CONTAINER_REMOTE_LAUNCH event handling.
+ // Skip kill() such container during CONTAINER_REMOTE_CLEANUP as
+ // it is not necessary and could cost 15 minutes delay if the node is
dead.
+ if (event.getType() == EventType.CONTAINER_REMOTE_CLEANUP &&
Review Comment:
I am doing it outside the switch case because in this case I dont want it to
go to `getContainer` code which is before the switch case check and
`getContainer` will end up giving the a new container and which will evenually
fail this check - `!containers.containsKey(containerID)`
```
Container c = getContainer(event);
```
```
private Container getContainer(ContainerLauncherEvent event) {
ContainerId id = event.getContainerID();
Container c = containers.get(id);
if(c == null) {
System.out.println("entering null");
c = new Container(event.getTaskAttemptID(), event.getContainerID(),
event.getContainerMgrAddress());
Container old = containers.putIfAbsent(id, c);
if(old != null) {
c = old;
}
}
return c;
}
```
> Avoid stopContainer() on dead node
> ----------------------------------
>
> Key: MAPREDUCE-7407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7407
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 3.3.4
> Reporter: Ashutosh Gupta
> Assignee: Ashutosh Gupta
> Priority: Major
> Labels: pull-request-available
>
> If a container failed to launch earlier due to terminated instances, it has
> already been removed from the container hash map. Avoiding the kill() for
> CONTAINER_REMOTE_CLEANUP will avoid wasting 15min per container on
> retries/timeout.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]