[ https://issues.apache.org/jira/browse/GOBBLIN-2189?focusedWorklogId=955755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-955755 ]
ASF GitHub Bot logged work on GOBBLIN-2189: ------------------------------------------- Author: ASF GitHub Bot Created on: 06/Feb/25 03:53 Start Date: 06/Feb/25 03:53 Worklog Time Spent: 10m Work Description: Blazer-007 commented on code in PR #4092: URL: https://github.com/apache/gobblin/pull/4092#discussion_r1944058346 ########## gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java: ########## @@ -62,6 +82,73 @@ protected synchronized void requestInitialContainers() { requestNewContainersForStaffingDeltas(deltas); } + /** + * Handle the completion of a container. A new container will be requested to replace the one + * that just exited depending on the exit status. + * <p> + * A container completes in either of the following conditions: + * <ol> + * <li> The container gets stopped by the ApplicationMaster. </li> + * <li> Some error happens in the container and caused the container to exit </li> + * <li> The container gets preempted by the ResourceManager </li> + * <li> The container gets killed due to some reason, for example, if it runs over the allowed amount of virtual or physical memory </li> + * </ol> + * A replacement container is needed in all except the first case. + * </p> + */ + @Override + protected void handleContainerCompletion(ContainerStatus containerStatus) { Review Comment: in `handleContainerCompletion` the containerId which is removed will not be the one when removing the same inside `reviseWorkforcePlanAndRequestNewContainers` and for `removedContainerIds` at one place we are adding and at other removing, even if interleaving call happens then both are working with different containerIds not the same so chance of inconsistent state are too low given that both are thread-safe data structures as well Issue Time Tracking ------------------- Worklog Id: (was: 955755) Time Spent: 2h (was: 1h 50m) > Implement ContainerCompletion callback in DynamicScalingYarnService > ------------------------------------------------------------------- > > Key: GOBBLIN-2189 > URL: https://issues.apache.org/jira/browse/GOBBLIN-2189 > Project: Apache Gobblin > Issue Type: Improvement > Components: gobblin-core > Reporter: Vivek Rai > Assignee: Abhishek Tiwari > Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > DynamicScalingYarnService currently doesn't handle scaling down containers > and neither does anything if container is killed abruptly or goes OOM. So to > handle this scenario containerCompletion callback should be implemented to > launch the replacement containers and also scaling down handling should be > done. -- This message was sent by Atlassian Jira (v8.20.10#820010)