[ 
https://issues.apache.org/jira/browse/SPARK-34519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18033975#comment-18033975
 ] 

Oleksiy Dyagilev commented on SPARK-34519:
------------------------------------------

In our production deployment, we run multiple Spark applications simultaneously 
on Kubernetes. I would like to implement a backoff mechanism to protect the 
Control Plane from being overloaded with continuous pod allocation requests 
when pods fail to start (e.g. due to Istio sidecar problems or other 
initialization issues). We're aware that Spark 3.2 introduced 
"spark.kubernetes.allocation.maxPendingPods" which can partially help limit the 
number of concurrent pending pods. However, the downside of this approach is 
that it increases startup time even when cluster conditions are normal.

[~dongjoon] , could you review the draft proposal below and let me know if you 
have better suggestions please.

-----------------

Introduce a dynamic *maxPendingLimit* that automatically adjusts based on 
system health.

Introduce three states:

*1. Normal State*

Initial state. The system operates with standard behavior.
 * *Transition to* \{*}Backoff{*}: If *failureThreshold* failures occur within 
the *failureInterval* (both configurable), transition to Backoff state.

*2. Backoff State*
 * Set *maxPendingLimit* to *initialMaxPending* (configurable, default: 1)
 * Track executor launch attempts with this reduced limit:
 ** {*}On failure{*}: Apply exponential backoff before the next attempt
 *** Exponential backoff sequence: *initialDelay* × 2^n, capped at 
*maxBackoffDelay.* E.g. 5s → 10s → 20s → 40s → ... → 5 min → 5 min 
({*}initialDelay{*}=5s{*}, maxBackoffDelay={*}5min)
 *** Continue attempting with the maximum delay until successful
 ** *On* \{*}success{*}: Transition to Recovery state

*3. Recovery State*
 * Gradually increase *maxPendingLimit* linearly using *maxPendingIncreaseStep* 
(configurable). Respect the upper limit defined by existing 
'spark.kubernetes.allocation.maxPendingPods'.
 * Increase *maxPendingLimit* only after all executors at the current limit 
succeed
 * *On* \{*}any failure{*}: Immediately revert to Backoff state
 * {*}Transition to Normal{*}: When the target number of executors is reached, 
i.e. all allocated (Any better ideas?)

!spark_backoff.png|width=553,height=195!

*New configs:*
|*Config*|*Description (draft version)*|*Type*|*Default value*|
|spark.kubernetes.allocation.executor.backoff.enabled|Enable backoff 
mechanism|Boolean|false|
|spark.kubernetes.allocation.executor.backoff.failureThreshold|Number of 
executor failures to trigger Backoff state|Integer|2 |
|spark.kubernetes.allocation.executor.backoff.failureInterval|Time window for 
counting failures|Duration|10m|
|spark.kubernetes.allocation.executor.backoff.initialDelay|Starting delay for 
exponential backoff|Duration|5s|
|spark.kubernetes.allocation.executor.backoff.maxBackoffDelay|Maximum 
exponential backoff delay cap|Duration|5m|
|spark.kubernetes.allocation.executor.backoff.initialMaxPending|Initial number 
of Maximum pending executors in Backoff state|Integer|1|
|spark.kubernetes.allocation.executor.backoff.maxPendingIncreaseStep|Linear 
increment step for Recovery state|Integer|2|

 

> ExecutorPodsAllocator use exponential backoff strategy when request executor 
> pod failed
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-34519
>                 URL: https://issues.apache.org/jira/browse/SPARK-34519
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 3.0.1
>         Environment: spark 3.0.1
> kubernetes 1.18.8
>            Reporter: Fengyu Cao
>            Priority: Minor
>         Attachments: spark_backoff.png
>
>
> # create a resouce quota `kubectl create quota test --hard=cpu=20,memory=60G`
>  # submit an application request more than quota `spark-submit 
> --executor-cores 5 --executor-memory 10G --num-executors 10 <spark-pi.py>`
>  # seems `ExecutorPodsAllocator: Going to request 5 executors from 
> Kubernetes.` print every second
> `spark.kubernetes.allocation.batch.delay` default is 1s, which good enough 
> when allocation succeeded, but exponential backoff maybe an better choice 
> when alloction failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to