[ https://issues.apache.org/jira/browse/GOBBLIN-2189?focusedWorklogId=953893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-953893 ]
ASF GitHub Bot logged work on GOBBLIN-2189: ------------------------------------------- Author: ASF GitHub Bot Created on: 23/Jan/25 15:09 Start Date: 23/Jan/25 15:09 Worklog Time Spent: 10m Work Description: Blazer-007 commented on code in PR #4092: URL: https://github.com/apache/gobblin/pull/4092#discussion_r1927140164 ########## gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java: ########## @@ -62,6 +79,71 @@ protected synchronized void requestInitialContainers() { requestNewContainersForStaffingDeltas(deltas); } + /** + * Handle the completion of a container. A new container will be requested to replace the one + * that just exited depending on the exit status. + * <p> + * A container completes in either of the following conditions: + * <ol> + * <li> The container gets stopped by the ApplicationMaster. </li> + * <li> Some error happens in the container and caused the container to exit </li> + * <li> The container gets preempted by the ResourceManager </li> + * <li> The container gets killed due to some reason, for example, if it runs over the allowed amount of virtual or physical memory </li> + * </ol> + * A replacement container is needed in all except the first case. + * </p> + */ + @Override + protected void handleContainerCompletion(ContainerStatus containerStatus) { + ContainerId completedContainerId = containerStatus.getContainerId(); + ContainerInfo completedContainerInfo = this.containerMap.remove(completedContainerId); + + if (completedContainerInfo == null) { + log.warn("Container {} not found in containerMap. This container onContainersCompleted() likely called before onContainersAllocated()", Review Comment: **I forgot to add the original comment here, will add in next revision** But simple reason for this is that callbacks are asynchronously called Issue Time Tracking ------------------- Worklog Id: (was: 953893) Time Spent: 0.5h (was: 20m) > Implement ContainerCompletion callback in DynamicScalingYarnService > ------------------------------------------------------------------- > > Key: GOBBLIN-2189 > URL: https://issues.apache.org/jira/browse/GOBBLIN-2189 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-core > Reporter: Vivek Rai > Assignee: Abhishek Tiwari > Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > DynamicScalingYarnService currently doesn't handle scaling down containers > and neither does anything if container is killed abruptly or goes OOM. So to > handle this scenario containerCompletion callback should be implemented to > launch the replacement containers and also scaling down handling should be > done. -- This message was sent by Atlassian Jira (v8.20.10#820010)