This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.1 by this push:
new 409e1fd68ff1 [SPARK-55134] Fix `BasicExecutorFeatureStep` to throw
`IllegalArgumentException` for executor cpu misconfigs
409e1fd68ff1 is described below
commit 409e1fd68ff1e3c70b4be07a5127b2c382515788
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Sat Jan 24 22:58:08 2026 +0900
[SPARK-55134] Fix `BasicExecutorFeatureStep` to throw
`IllegalArgumentException` for executor cpu misconfigs
### What changes were proposed in this pull request?
This PR aims to fix `BasicExecutorFeatureStep` to throw
`IllegalArgumentException` for executor cpu misconfigs in order the Spark jobs
ASAP.
### Why are the changes needed?
From Apache Spark 4.1.0, Spark driver pod throws `SparkException` for the
executor cpu misconfiguration before requesting to the K8s control plain. This
improvement reduces the burden of K8s control plane.
- #51678
```
26/01/24 06:55:31 INFO ExecutorPodsAllocator: Going to request 5 executors
from Kubernetes for ResourceProfile Id: 0, target: 5, known: 0,
sharedSlotFromPendingPods: 2147483647.
26/01/24 06:55:31 INFO ExecutorPodsAllocator: Found 0 reusable PVCs from 0
PVCs
26/01/24 06:55:31 WARN ExecutorPodsSnapshotsStoreImpl: Exception when
notifying snapshot subscriber.
org.apache.spark.SparkException: The executor cpu request (4) should be
less than or equal to cpu limit (1)
at
org.apache.spark.deploy.k8s.features.BasicExecutorFeatureStep.$anonfun$configurePod$11(BasicExecutorFeatureStep.scala:236)
```
However, the Spark driver keeps re-trying to create executor pods in any
way if the users didn't have an additional `spark.driver.timeout` configuration.
- #45313
So, we had better exit the Spark job in this case ASAP. We can do that
simply switches `SparkException` to `IllegalArgumentException` like the other
steps.
- https://github.com/apache/spark/pull/30084
### Does this PR introduce _any_ user-facing change?
Technically no because previously those misconfigured Spark job didn't get
any resources.
### How was this patch tested?
Pass the CIs with the updated test case.
Also, I checked manually via `spark-submit`:
```
$ bin/spark-submit --master k8s://$K8S_MASTER \
--deploy-mode cluster \
-c spark.executor.instances=5 \
-c spark.kubernetes.executor.request.cores=4 \
-c spark.kubernetes.executor.limit.cores=1 \
-c spark.kubernetes.container.image=apache/spark:SPARK-55134 \
-c spark.kubernetes.authenticate.driver.serviceAccountName=spark \
-c spark.kubernetes.executor.useDriverPodIP=true \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples.jar 200000
...
26/01/24 16:33:57 INFO LoggingPodStatusWatcherImpl: State changed, new
state:
pod name: org-apache-spark-examples-sparkpi-0482f19beeec7491-driver
namespace: default
labels: spark-app-name -> org-apache-spark-examples-sparkpi,
spark-app-selector -> spark-ee23f03db88b43fb906b0dbc1b04ad63, spark-role ->
driver, spark-version -> 4.2.0-SNAPSHOT
pod uid: c6d41845-5893-4135-a065-278d94500315
creation time: 2026-01-24T07:33:52Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume-driver,
kube-api-access-8rbc8
node name: lima-rancher-desktop
start time: 2026-01-24T07:33:52Z
phase: Failed
container status:
container name: spark-kubernetes-driver
container image: apache/spark:SPARK-55134
container state: terminated
container started at: 2026-01-24T07:33:53Z
container finished at: 2026-01-24T07:33:55Z
exit code: 1
termination reason: Error
26/01/24 16:33:57 INFO LoggingPodStatusWatcherImpl: Application status for
spark-ee23f03db88b43fb906b0dbc1b04ad63 (phase: Failed)
26/01/24 16:33:57 INFO LoggingPodStatusWatcherImpl: Container final
statuses:
container name: spark-kubernetes-driver
container image: apache/spark:SPARK-55134
container state: terminated
container started at: 2026-01-24T07:33:53Z
container finished at: 2026-01-24T07:33:55Z
exit code: 1
termination reason: Error
26/01/24 16:33:57 INFO LoggingPodStatusWatcherImpl: Application
org.apache.spark.examples.SparkPi with application ID
spark-ee23f03db88b43fb906b0dbc1b04ad63 and submission ID
default:org-apache-spark-examples-sparkpi-0482f19beeec7491-driver finished
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #53948 from dongjoon-hyun/SPARK-55134.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit ab3ec9e34441dc11372e764d6384b309af36abee)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala | 4 ++--
.../spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
index 13d1f1bc98a0..0cfa842ef396 100644
---
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
+++
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
@@ -232,8 +232,8 @@ private[spark] class BasicExecutorFeatureStep(
executorLimitCores.map { limitCores =>
val executorCpuLimitQuantity = new Quantity(limitCores)
if (executorCpuLimitQuantity.compareTo(executorCpuQuantity) < 0) {
- throw new SparkException(s"The executor cpu request
($executorCpuQuantity) should be " +
- s"less than or equal to cpu limit ($executorCpuLimitQuantity)")
+ throw new IllegalArgumentException(s"The executor cpu request
($executorCpuQuantity) " +
+ s"should be less than or equal to cpu limit
($executorCpuLimitQuantity)")
}
new ContainerBuilder(executorContainerWithConfVolume)
.editResources()
diff --git
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
index ced1326e7938..d264484f4d03 100644
---
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
+++
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
@@ -123,7 +123,7 @@ class BasicExecutorFeatureStepSuite extends SparkFunSuite
with BeforeAndAfter {
test("SPARK-52933: Verify if the executor cpu request exceeds limit") {
baseConf.set(KUBERNETES_EXECUTOR_REQUEST_CORES, "2")
baseConf.set(KUBERNETES_EXECUTOR_LIMIT_CORES, "1")
- val error = intercept[SparkException] {
+ val error = intercept[IllegalArgumentException] {
initDefaultProfile(baseConf)
val step = new BasicExecutorFeatureStep(newExecutorConf(), new
SecurityManager(baseConf),
defaultProfile)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]