[jira] [Work logged] (GOBBLIN-2189) Implement ContainerCompletion callback in DynamicScalingYarnService

ASF GitHub Bot (Jira) Wed, 05 Feb 2025 19:55:01 -0800


     [ 
https://issues.apache.org/jira/browse/GOBBLIN-2189?focusedWorklogId=955755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-955755
 ]


ASF GitHub Bot logged work on GOBBLIN-2189:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Feb/25 03:53
            Start Date: 06/Feb/25 03:53
    Worklog Time Spent: 10m 
      Work Description: Blazer-007 commented on code in PR #4092:
URL: https://github.com/apache/gobblin/pull/4092#discussion_r1944058346


##########
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java:
##########
@@ -62,6 +82,73 @@ protected synchronized void requestInitialContainers() {
     requestNewContainersForStaffingDeltas(deltas);
   }
 
+  /**
+   * Handle the completion of a container. A new container will be requested 
to replace the one
+   * that just exited depending on the exit status.
+   * <p>
+   * A container completes in either of the following conditions:
+   * <ol>
+   *   <li> The container gets stopped by the ApplicationMaster. </li>
+   *   <li> Some error happens in the container and caused the container to 
exit </li>
+   *   <li> The container gets preempted by the ResourceManager </li>
+   *   <li> The container gets killed due to some reason, for example, if it 
runs over the allowed amount of virtual or physical memory </li>
+   * </ol>
+   * A replacement container is needed in all except the first case.
+   * </p>
+   */
+  @Override
+  protected void handleContainerCompletion(ContainerStatus containerStatus) {

Review Comment:
   in `handleContainerCompletion` the containerId which is removed will not be 
the one when removing the same inside 
`reviseWorkforcePlanAndRequestNewContainers` and for `removedContainerIds` at 
one place we are adding and at other removing, even if interleaving call 
happens then both are working with different containerIds not the same so 
chance of inconsistent state are too low given that both are thread-safe data 
structures as well





Issue Time Tracking
-------------------

    Worklog Id:     (was: 955755)
    Time Spent: 2h  (was: 1h 50m)

> Implement ContainerCompletion callback in DynamicScalingYarnService
> -------------------------------------------------------------------
>
>                 Key: GOBBLIN-2189
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2189
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-core
>            Reporter: Vivek Rai
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> DynamicScalingYarnService currently doesn't handle scaling down containers 
> and neither does anything if container is killed abruptly or goes OOM. So to 
> handle this scenario containerCompletion callback should be implemented to 
> launch the replacement containers and also scaling down handling should be 
> done.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (GOBBLIN-2189) Implement ContainerCompletion callback in DynamicScalingYarnService

Reply via email to