[
https://issues.apache.org/jira/browse/YUNIKORN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865271#comment-17865271
]
Wilfred Spiegelenburg commented on YUNIKORN-2735:
-------------------------------------------------
Reservation is not an under the hood optimisation it prevents starvation for
large(r) allocations or allocations with specific resource requests that fit
only on specific nodes. If you turn it off you will see a large impact in real
world scenarios.
Disabling reservation will cause large allocations to be starved in busy
clusters and we should never do that. The variable for turning it off was
introduced during the development of the code and should have been removed.
Surprised that it has survived this long.
We have/had a TODO in the code to make this configurable. Currently it is fixed
to 2 seconds. It should be a reloadable configuration value. I would also argue
that the current 2 seconds is too quick and 30 seconds would allow us to be a
bit more eager.
I would propose the following setup:
* configuration name: service.ReservationDelay
* granularity: seconds
* default: 30 seconds
* minimum: 2 seconds (allow current behaviour)
* maximum: 3600 seconds (prevent starvation and turning off reservations)
* reloadable: true
* notes:
** old reservations are not re-evaluated when the value is changed
** settings outside the minimum..maximum range will use the default
** when reloading the value is not changed if outside the range
> YuniKorn doesn't schedule correctly after some pods were marked as
> Unschedulable
> --------------------------------------------------------------------------------
>
> Key: YUNIKORN-2735
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2735
> Project: Apache YuniKorn
> Issue Type: Bug
> Reporter: Volodymyr Kot
> Priority: Major
> Attachments: bug-logs, driver.yml, executor.yml, nodestate, podstate
>
>
> It is a bit of an edge case, but I can consistently reproduce this on master
> - see steps and comments used below:
> # Create a new cluster with kind, with 4 cpus/8Gb of memory
> # Deploy YuniKorn using helm
> # Set up service account for Spark
> ## "kubectl create serviceaccount spark"
> ## "kubectl create clusterrolebinding spark-role --clusterrole=edit
> --serviceaccount=default:spark --namespace=default"
> # Run kubectl proxy" to be able to run spark-submit
> # Create Spark application* 1 with driver and 2 executors - fits fully,
> placeholders are created and replaced
> # Create Spark application 2 with driver and 2 executors - only one executor
> placeholder is scheduled, rest of the pods are marked Unschedulable
> # Delete one of the executors from application 1
> # Spark driver re-creates the executor, it is marked as unschedulable
>
> At that point scheduler is "stuck", and won't schedule either executor from
> application 1 OR placeholder for executor from application 2 - it deems both
> of those unschedulable. See logs below, and please let me know if I
> misunderstood something/it is expected behavior!
>
> *Script used to run spark-submit:
> {code:java}
> ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001
> --deploy-mode cluster --name spark-pi \
> --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi
> \
> --class org.apache.spark.examples.SparkPi \
> --conf spark.executor.instances=2 \
> --conf spark.kubernetes.executor.request.cores=0.5 \
> --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.kubernetes.driver.podTemplateFile=./driver.yml \
> --conf spark.kubernetes.executor.podTemplateFile=./executor.yml \
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0.jar 30000 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]