[ https://issues.apache.org/jira/browse/MAPREDUCE-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604860#comment-17604860 ]
ASF GitHub Bot commented on MAPREDUCE-7407: ------------------------------------------- ashutoshcipher commented on code in PR #4779: URL: https://github.com/apache/hadoop/pull/4779#discussion_r971090194 ########## hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java: ########## @@ -385,6 +385,20 @@ public void run() { // TODO: Do it only once per NodeManager. ContainerId containerID = event.getContainerID(); + // If the container failed to launch earlier (due to dead node for example), + // it has been marked as FAILED and removed from containers during + // CONTAINER_REMOTE_LAUNCH event handling. + // Skip kill() such container during CONTAINER_REMOTE_CLEANUP as + // it is not necessary and could cost 15 minutes delay if the node is dead. + if (event.getType() == EventType.CONTAINER_REMOTE_CLEANUP && Review Comment: I am doing it outside the switch case because in this case I dont want it to go to `getContainer` code which is before the switch case check and `getContainer` will end up giving the a new container and which will evenually fail this check - `!containers.containsKey(containerID)` ``` Container c = getContainer(event); ``` ``` private Container getContainer(ContainerLauncherEvent event) { ContainerId id = event.getContainerID(); Container c = containers.get(id); if(c == null) { System.out.println("entering null"); c = new Container(event.getTaskAttemptID(), event.getContainerID(), event.getContainerMgrAddress()); Container old = containers.putIfAbsent(id, c); if(old != null) { c = old; } } return c; } ``` > Avoid stopContainer() on dead node > ---------------------------------- > > Key: MAPREDUCE-7407 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7407 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 3.3.4 > Reporter: Ashutosh Gupta > Assignee: Ashutosh Gupta > Priority: Major > Labels: pull-request-available > > If a container failed to launch earlier due to terminated instances, it has > already been removed from the container hash map. Avoiding the kill() for > CONTAINER_REMOTE_CLEANUP will avoid wasting 15min per container on > retries/timeout. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org