[
https://issues.apache.org/jira/browse/YUNIKORN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865326#comment-17865326
]
Volodymyr Kot commented on YUNIKORN-2735:
-----------------------------------------
Hey [~wilfreds] , thanks for a thoughtful response here! I understand this
functionality is generally useful, and I agree that it should be on by default.
What I'm saying is that is not _universally_ useful. As discussed on slack,
this is a minimal repro of a problem that I found during real-world testing -
specifically in a cluster that has queues that don't overlap (due to our node
labelling/pod affinity setup). In this case, queue with FIFO policy is
sufficient to prevent starvation (aided by binpacking policy and autoscaling) -
and reservation flow is actively harming utilization. Based on the slack
discussion, [~eladdolev] seems to have exactly the same problem.
Happy to contribute the above, but is there a world where you still allow to
turn off node reservations completely? I imagined a toggle similar to
[https://yunikorn.apache.org/docs/user_guide/service_config#servicedisablegangscheduling]
, with "not recommended" / "I know what I am doing" messaging.
> YuniKorn doesn't schedule correctly after some pods were marked as
> Unschedulable
> --------------------------------------------------------------------------------
>
> Key: YUNIKORN-2735
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2735
> Project: Apache YuniKorn
> Issue Type: Bug
> Reporter: Volodymyr Kot
> Priority: Major
> Attachments: bug-logs, driver.yml, executor.yml, nodestate, podstate
>
>
> It is a bit of an edge case, but I can consistently reproduce this on master
> - see steps and comments used below:
> # Create a new cluster with kind, with 4 cpus/8Gb of memory
> # Deploy YuniKorn using helm
> # Set up service account for Spark
> ## "kubectl create serviceaccount spark"
> ## "kubectl create clusterrolebinding spark-role --clusterrole=edit
> --serviceaccount=default:spark --namespace=default"
> # Run kubectl proxy" to be able to run spark-submit
> # Create Spark application* 1 with driver and 2 executors - fits fully,
> placeholders are created and replaced
> # Create Spark application 2 with driver and 2 executors - only one executor
> placeholder is scheduled, rest of the pods are marked Unschedulable
> # Delete one of the executors from application 1
> # Spark driver re-creates the executor, it is marked as unschedulable
>
> At that point scheduler is "stuck", and won't schedule either executor from
> application 1 OR placeholder for executor from application 2 - it deems both
> of those unschedulable. See logs below, and please let me know if I
> misunderstood something/it is expected behavior!
>
> *Script used to run spark-submit:
> {code:java}
> ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001
> --deploy-mode cluster --name spark-pi \
> --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi
> \
> --class org.apache.spark.examples.SparkPi \
> --conf spark.executor.instances=2 \
> --conf spark.kubernetes.executor.request.cores=0.5 \
> --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.kubernetes.driver.podTemplateFile=./driver.yml \
> --conf spark.kubernetes.executor.podTemplateFile=./executor.yml \
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0.jar 30000 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]