This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 0467aca97120 [SPARK-49783][YARN] Fix resource leak of yarn allocator
0467aca97120 is described below
commit 0467aca97120e493a3b02c5ff9460042dcdf233d
Author: Junfan Zhang <[email protected]>
AuthorDate: Mon Jan 6 20:00:08 2025 -0800
[SPARK-49783][YARN] Fix resource leak of yarn allocator
### What changes were proposed in this pull request?
Fix the resource leak of yarn allocator
### Why are the changes needed?
When the target < running containers number, the assigned containers from
the resource manager will be skipped, but these containers are not released by
invoking the amClient.releaseAssignedContainer , that will make these
containers reserved into the Yarn resourceManager at least 10 minutes. And so,
the cluster resource will be wasted at a high ratio.
And this will reflect that the vcore * seconds statistics from yarn side
will be greater than the result from the spark event logs.
From my statistics, the cluster resource waste ratio is ~25% if the spark
jobs are exclusive in this cluster.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
In our internal hadoop cluster
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #48238 from zuston/patch-1.
Authored-by: Junfan Zhang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala | 1 +
1 file changed, 1 insertion(+)
diff --git
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index b593d73f8572..911ce2e27f96 100644
---
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -820,6 +820,7 @@ private[yarn] class YarnAllocator(
logInfo(log"Skip launching executorRunnable as running executors
count: " +
log"${MDC(LogKeys.COUNT, rpRunningExecs)} reached target executors
count: " +
log"${MDC(LogKeys.NUM_EXECUTOR_TARGET,
getOrUpdateTargetNumExecutorsForRPId(rpId))}.")
+ internalReleaseContainer(container)
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]