[ 
https://issues.apache.org/jira/browse/YUNIKORN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865366#comment-17865366
 ] 

Elad Dolev commented on YUNIKORN-2735:
--------------------------------------

Thank you everyone for your insights here, those are very helpful.

So I'm not sure if the case here accurately describes the issue we're facing 
with our special setup, but the current reservation behavior unfortunately 
practically prevents us from using YuniKorn.

I'm happy to create a new ticket that better explains the situation we're 
facing in case it is needed, but I have a feeling that adding some kind of a 
configuration might solve it for us without breaking any current behavior.

If the scheduler could be somehow configured to "let go" after a configurable 
amount of time, and cancel a reservation that would clear up the deadlock for 
us.

I reckon there are many different ways of introducing such a logic, one of 
which might be actually good.

In our specific setup, we have different parent queues for different machines 
types that are running different sets of workloads. For some queues (machine 
types) we never need gang scheduling, and for others we always need it.

For example, in our use-case, a logic that would state that after X minutes 
reservations would be disabled on a worker node for a specific scheduling cycle 
(and then reenabled and countdown starts again) would be sufficient.

In our specific use-case, either completely excluding workloads in a gang (that 
is, never cancel a reservation made for a gang member) would work, or any other 
kind of configuration on the queue level, for example disabling gang scheduling 
for a queue, or disable reservations for a queue, or expire reservations after 
X minutes for a queue etc. would also work.

Thanks again and happy hearing your thoughts here.

> YuniKorn doesn't schedule correctly after some pods were marked as 
> Unschedulable
> --------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2735
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2735
>             Project: Apache YuniKorn
>          Issue Type: Bug
>            Reporter: Volodymyr Kot
>            Priority: Major
>         Attachments: bug-logs, driver.yml, executor.yml, nodestate, podstate
>
>
> It is a bit of an edge case, but I can consistently reproduce this on master 
> - see steps and comments used below:
>  # Create a new cluster with kind, with 4 cpus/8Gb of memory
>  # Deploy YuniKorn using helm
>  # Set up service account for Spark
>  ## "kubectl create serviceaccount spark"
>  ## "kubectl create clusterrolebinding spark-role --clusterrole=edit 
> --serviceaccount=default:spark --namespace=default"
>  # Run kubectl proxy" to be able to run spark-submit
>  # Create Spark application* 1 with driver and 2 executors - fits fully, 
> placeholders are created and replaced
>  # Create Spark application 2 with driver and 2 executors - only one executor 
> placeholder is scheduled, rest of the pods are marked Unschedulable
>  # Delete one of the executors from application 1
>  # Spark driver re-creates the executor, it is marked as unschedulable
>  
> At that point scheduler is "stuck", and won't schedule either executor from 
> application 1 OR placeholder for executor from application 2 - it deems both 
> of those unschedulable. See logs below, and please let me know if I 
> misunderstood something/it is expected behavior!
>  
> *Script used to run spark-submit:
> {code:java}
> ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 
> --deploy-mode cluster --name spark-pi \
>    --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi 
> \
>    --class org.apache.spark.examples.SparkPi \
>    --conf spark.executor.instances=2 \
>    --conf spark.kubernetes.executor.request.cores=0.5 \
>    --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \
>    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>    --conf spark.kubernetes.driver.podTemplateFile=./driver.yml \
>    --conf spark.kubernetes.executor.podTemplateFile=./executor.yml \
>    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0.jar 30000 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to