[ 
https://issues.apache.org/jira/browse/YUNIKORN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Kot updated YUNIKORN-2735:
------------------------------------
    Description: 
It is a bit of an edge case, but I can consistently reproduce this on master - 
see steps and comments used below:
 # Create a new cluster with kind, with 4 cpus/8Gb of memory
 # Deploy YuniKorn using helm
 # Set up service account for Spark
 ## "kubectl create serviceaccount spark"
 ## "kubectl create clusterrolebinding spark-role --clusterrole=edit 
--serviceaccount=default:spark --namespace=default"
 # Run kubectl proxy" to be able to run spark-submit
 # Create Spark application* 1 with driver and 2 executors - fits fully, 
placeholders are created and replaced
 # Create Spark application 2 with driver and 2 executors - only one executor 
placeholder is scheduled, rest of the pods are marked Unschedulable
 # Delete one of the executors from application 1
 # Spark driver re-creates the executor, it is marked as unschedulable

 

At that point scheduler is "stuck", and won't schedule either executor from 
application 1 OR placeholder for executor from application 2 - it deems both of 
those unschedulable. See logs below, and please let me know if I misunderstood 
something/it is expected behavior!

 

*Script used to run spark-submit:
{code:java}
${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 
--deploy-mode cluster --name spark-pi \
   --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
   --class org.apache.spark.examples.SparkPi \
   --conf spark.executor.instances=2 \
   --conf spark.kubernetes.executor.request.cores=0.5 \
   --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \
   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
   --conf spark.kubernetes.driver.podTemplateFile=./driver.yml \
   --conf spark.kubernetes.executor.podTemplateFile=./executor.yml \
   local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0.jar 30000 {code}

  was:
It is a bit of an edge case, but I can consistently reproduce this on master - 
see steps and comments used below:
 # Create a new cluster with kind, with 4 cpus/8Gb of memory
 # Deploy YuniKorn using helm
 # Set up service account for Spark
 ## "kubectl create serviceaccount spark"
 ## "kubectl create clusterrolebinding spark-role --clusterrole=edit 
--serviceaccount=default:spark --namespace=default"
 # Run kubectl proxy" to be able to run spark-submit
 # Create Spark application* 1 with driver and 2 executors - fits fully, 
placeholders are created and replaced
 # Create Spark application 2 with driver and 2 executors - only one executor 
placeholder is scheduled, rest of the pods are marked Unschedulable
 # Delete one of the executors from application 1
 # Spark driver re-creates the executor, it is marked as unschedulable

 

At that point scheduler is "stuck", and won't schedule either executor from 
application 1 OR placeholder for executor from application 2 - it deems both of 
those unschedulable. See logs below, and please let me know if I misunderstood 
something/it is expected behavior!

 

*Script used to run spark-submit:
{code:java}
${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 
--deploy-mode cluster --name spark-pi \
   --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
   --class org.apache.spark.examples.SparkPi \
   --conf spark.executor.instances=2 \
   --conf spark.kubernetes.executor.request.cores=0.5 \
   --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \
   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
   --conf 
spark.kubernetes.driver.podTemplateFile=/Volumes/git/future/driver.yml \
   --conf 
spark.kubernetes.executor.podTemplateFile=/Volumes/git/future/executor.yml \
   local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0.jar 30000 {code}


> YuniKorn doesn't schedule correctly after some pods were marked as 
> Unschedulable
> --------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2735
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2735
>             Project: Apache YuniKorn
>          Issue Type: Bug
>            Reporter: Volodymyr Kot
>            Priority: Major
>         Attachments: bug-logs, driver.yml, executor.yml, nodestate, podstate
>
>
> It is a bit of an edge case, but I can consistently reproduce this on master 
> - see steps and comments used below:
>  # Create a new cluster with kind, with 4 cpus/8Gb of memory
>  # Deploy YuniKorn using helm
>  # Set up service account for Spark
>  ## "kubectl create serviceaccount spark"
>  ## "kubectl create clusterrolebinding spark-role --clusterrole=edit 
> --serviceaccount=default:spark --namespace=default"
>  # Run kubectl proxy" to be able to run spark-submit
>  # Create Spark application* 1 with driver and 2 executors - fits fully, 
> placeholders are created and replaced
>  # Create Spark application 2 with driver and 2 executors - only one executor 
> placeholder is scheduled, rest of the pods are marked Unschedulable
>  # Delete one of the executors from application 1
>  # Spark driver re-creates the executor, it is marked as unschedulable
>  
> At that point scheduler is "stuck", and won't schedule either executor from 
> application 1 OR placeholder for executor from application 2 - it deems both 
> of those unschedulable. See logs below, and please let me know if I 
> misunderstood something/it is expected behavior!
>  
> *Script used to run spark-submit:
> {code:java}
> ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 
> --deploy-mode cluster --name spark-pi \
>    --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi 
> \
>    --class org.apache.spark.examples.SparkPi \
>    --conf spark.executor.instances=2 \
>    --conf spark.kubernetes.executor.request.cores=0.5 \
>    --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.4.0 \
>    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>    --conf spark.kubernetes.driver.podTemplateFile=./driver.yml \
>    --conf spark.kubernetes.executor.podTemplateFile=./executor.yml \
>    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0.jar 30000 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to