This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 0467aca97120 [SPARK-49783][YARN] Fix resource leak of yarn allocator
0467aca97120 is described below

commit 0467aca97120e493a3b02c5ff9460042dcdf233d
Author: Junfan Zhang <[email protected]>
AuthorDate: Mon Jan 6 20:00:08 2025 -0800

    [SPARK-49783][YARN] Fix resource leak of yarn allocator
    
    ### What changes were proposed in this pull request?
    
    Fix the resource leak of yarn allocator
    
    ### Why are the changes needed?
    
    When the target < running containers number, the assigned containers from 
the resource manager will be skipped, but these containers are not released by 
invoking the amClient.releaseAssignedContainer , that will make these 
containers reserved into the Yarn resourceManager at least 10 minutes. And so, 
the cluster resource will be wasted at a high ratio.
    
    And this will reflect that the vcore * seconds statistics from yarn side 
will be greater than the result from the spark event logs.
    
    From my statistics, the cluster resource waste ratio is ~25% if the spark 
jobs are exclusive in this cluster.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    In our internal hadoop cluster
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #48238 from zuston/patch-1.
    
    Authored-by: Junfan Zhang <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index b593d73f8572..911ce2e27f96 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -820,6 +820,7 @@ private[yarn] class YarnAllocator(
         logInfo(log"Skip launching executorRunnable as running executors 
count: " +
           log"${MDC(LogKeys.COUNT, rpRunningExecs)} reached target executors 
count: " +
           log"${MDC(LogKeys.NUM_EXECUTOR_TARGET, 
getOrUpdateTargetNumExecutorsForRPId(rpId))}.")
+        internalReleaseContainer(container)
       }
     }
   }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to