This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new ab3ec9e34441 [SPARK-55134] Fix `BasicExecutorFeatureStep` to throw 
`IllegalArgumentException` for executor cpu misconfigs
ab3ec9e34441 is described below

commit ab3ec9e34441dc11372e764d6384b309af36abee
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Sat Jan 24 22:58:08 2026 +0900

    [SPARK-55134] Fix `BasicExecutorFeatureStep` to throw 
`IllegalArgumentException` for executor cpu misconfigs
    
    ### What changes were proposed in this pull request?
    
    This PR aims to fix `BasicExecutorFeatureStep` to throw 
`IllegalArgumentException` for executor cpu misconfigs in order the Spark jobs 
ASAP.
    
    ### Why are the changes needed?
    
    From Apache Spark 4.1.0, Spark driver pod throws `SparkException` for the 
executor cpu misconfiguration before requesting to the K8s control plain. This 
improvement reduces the burden of K8s control plane.
    - #51678
    
    ```
    26/01/24 06:55:31 INFO ExecutorPodsAllocator: Going to request 5 executors 
from Kubernetes for ResourceProfile Id: 0, target: 5, known: 0, 
sharedSlotFromPendingPods: 2147483647.
    26/01/24 06:55:31 INFO ExecutorPodsAllocator: Found 0 reusable PVCs from 0 
PVCs
    26/01/24 06:55:31 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
notifying snapshot subscriber.
    org.apache.spark.SparkException: The executor cpu request (4) should be 
less than or equal to cpu limit (1)
            at 
org.apache.spark.deploy.k8s.features.BasicExecutorFeatureStep.$anonfun$configurePod$11(BasicExecutorFeatureStep.scala:236)
    ```
    
    However, the Spark driver keeps re-trying to create executor pods in any 
way if the users didn't have an additional `spark.driver.timeout` configuration.
    - #45313
    
    So, we had better exit the Spark job in this case ASAP. We can do that 
simply switches `SparkException` to `IllegalArgumentException` like the other 
steps.
    
    - https://github.com/apache/spark/pull/30084
    
    ### Does this PR introduce _any_ user-facing change?
    
    Technically no because previously those misconfigured Spark job didn't get 
any resources.
    
    ### How was this patch tested?
    
    Pass the CIs with the updated test case.
    
    Also, I checked manually via `spark-submit`:
    ```
    $ bin/spark-submit --master k8s://$K8S_MASTER \
    --deploy-mode cluster \
    -c spark.executor.instances=5 \
    -c spark.kubernetes.executor.request.cores=4 \
    -c spark.kubernetes.executor.limit.cores=1 \
    -c spark.kubernetes.container.image=apache/spark:SPARK-55134 \
    -c spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    -c spark.kubernetes.executor.useDriverPodIP=true \
    --class org.apache.spark.examples.SparkPi \
    local:///opt/spark/examples/jars/spark-examples.jar 200000
    ...
    26/01/24 16:33:57 INFO LoggingPodStatusWatcherImpl: State changed, new 
state:
             pod name: org-apache-spark-examples-sparkpi-0482f19beeec7491-driver
             namespace: default
             labels: spark-app-name -> org-apache-spark-examples-sparkpi, 
spark-app-selector -> spark-ee23f03db88b43fb906b0dbc1b04ad63, spark-role -> 
driver, spark-version -> 4.2.0-SNAPSHOT
             pod uid: c6d41845-5893-4135-a065-278d94500315
             creation time: 2026-01-24T07:33:52Z
             service account name: spark
             volumes: spark-local-dir-1, spark-conf-volume-driver, 
kube-api-access-8rbc8
             node name: lima-rancher-desktop
             start time: 2026-01-24T07:33:52Z
             phase: Failed
             container status:
                     container name: spark-kubernetes-driver
                     container image: apache/spark:SPARK-55134
                     container state: terminated
                     container started at: 2026-01-24T07:33:53Z
                     container finished at: 2026-01-24T07:33:55Z
                     exit code: 1
                     termination reason: Error
    26/01/24 16:33:57 INFO LoggingPodStatusWatcherImpl: Application status for 
spark-ee23f03db88b43fb906b0dbc1b04ad63 (phase: Failed)
    26/01/24 16:33:57 INFO LoggingPodStatusWatcherImpl: Container final 
statuses:
             container name: spark-kubernetes-driver
             container image: apache/spark:SPARK-55134
             container state: terminated
             container started at: 2026-01-24T07:33:53Z
             container finished at: 2026-01-24T07:33:55Z
             exit code: 1
             termination reason: Error
    26/01/24 16:33:57 INFO LoggingPodStatusWatcherImpl: Application 
org.apache.spark.examples.SparkPi with application ID 
spark-ee23f03db88b43fb906b0dbc1b04ad63 and submission ID 
default:org-apache-spark-examples-sparkpi-0482f19beeec7491-driver finished
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #53948 from dongjoon-hyun/SPARK-55134.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala   | 4 ++--
 .../spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala     | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
index 5f61c014127a..855e404b7646 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
@@ -239,8 +239,8 @@ private[spark] class BasicExecutorFeatureStep(
       executorLimitCores.map { limitCores =>
         val executorCpuLimitQuantity = new Quantity(limitCores)
         if (executorCpuLimitQuantity.compareTo(executorCpuQuantity) < 0) {
-          throw new SparkException(s"The executor cpu request 
($executorCpuQuantity) should be " +
-            s"less than or equal to cpu limit ($executorCpuLimitQuantity)")
+          throw new IllegalArgumentException(s"The executor cpu request 
($executorCpuQuantity) " +
+            s"should be less than or equal to cpu limit 
($executorCpuLimitQuantity)")
         }
         new ContainerBuilder(executorContainerWithConfVolume)
           .editResources()
diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
index b8b5da192a09..2a5dc864d635 100644
--- 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala
@@ -126,7 +126,7 @@ class BasicExecutorFeatureStepSuite extends SparkFunSuite 
with BeforeAndAfter {
   test("SPARK-52933: Verify if the executor cpu request exceeds limit") {
     baseConf.set(KUBERNETES_EXECUTOR_REQUEST_CORES, "2")
     baseConf.set(KUBERNETES_EXECUTOR_LIMIT_CORES, "1")
-    val error = intercept[SparkException] {
+    val error = intercept[IllegalArgumentException] {
       initDefaultProfile(baseConf)
       val step = new BasicExecutorFeatureStep(newExecutorConf(), new 
SecurityManager(baseConf),
         defaultProfile)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to