jameschen1519 opened a new issue #90: Completed (and sometimes deleted) pods 
are still marked as "Running" and consume resources
URL: https://github.com/apache/incubator-yunikorn-core/issues/90
 
 
   When running spark jobs via spark-submit on Kubernetes and applying the 
Yunikorn scheduler to a driver/executor pairing, upon termination the Yunikorn 
scheduler does not mark these jobs as complete, and the jobs are still marked 
as "Running". This takes up resources in the job queues to which the 
driver/executors are assigned, eventually resulting in resource starvation 
until the drivers are manually deleted. Unfortunately, deletion of these pods 
might not necessarily free up resources--too many cycles of starting and 
stopping Yunikorn-scheduled spark pods results in all resources being consumed, 
even when there are no more Yunikorn-scheduled spark pods available.
   
   (It is also worth noting that the driver and executor jobs enter the same 
queue regardless of what the executor podTemplateFile specifies. We are unsure 
of if this is a feature or a bug.)
   
   Listed below are the reproduction steps. Please let me know if any 
clarification is needed; thanks.
   
   ~~~~~~~~~~~~~~~~~~~~~~~~
   
   Environment setup:
   For setting up the Yunikorn pods (Used the helm chart, but also tried 
setting it up manually):
   `helm install ./yunikorn --namespace test ./yunikorn --generate-name`
   
   queues.yaml snippet to be put into yunikorn with `kubectl -n test edit 
configmap yunikorn-scheduler`:
   ```
     queues.yaml: |
       partitions:
         - name: default
           placementrules:
             - name: provided
               create: false
           queues:
             - name: root
               submitacl: '*'
               queues:
                 - name: driver
                   resources:
                     guaranteed:
                       memory: 10000
                       vcore: 1000
                     max:
                       memory: 40000
                       vcore: 9000
                 - name: executors
                   resources:
                     guaranteed:
                       memory: 1000
                       vcore: 1000
                     max:
                       memory: 15000
                       vcore: 6000
   
   ```
   
   Command used:
   
   ```
   spark-submit     \
   --master k8s://https://<YOUR K8S IP>:6443     \
   --deploy-mode cluster     \
   --name spark-pi     \
   --class org.apache.spark.examples.SparkPi     \
   --conf spark.kubernetes.container.image=<YOUR SPARK IMAGE>     \
   --conf spark.kubernetes.namespace=test     \
   --conf spark.driver.extraClassPath=/opt/hadoop/etc/hadoop     \
   --conf spark.ssl.enabled=false     \
   --conf spark.authenticate=false     \
   --conf spark.kubernetes.driver.podTemplateFile=/driver.yaml \
   --conf spark.kubernetes.executor.podTemplateFile=/executor.yaml \
   --conf spark.network.crypto.enabled=false \
   <YOUR SPARK JAR FILE; E.G. hdfs://<YOUR HDFS URL>/sparkexample.jar> &
   ```
   
   Under driver.yaml:
   ```
   apiVersion: v1
   kind: Pod
   metadata:
     labels:
       spark-app-id: spark-00001
       queue: root.driver
   spec:
     schedulerName: yunikorn
   
   ```
   
   Under executor.yaml:
   ```
   apiVersion: v1
   kind: Pod
   metadata:
     labels:
       spark-app-id: spark-00001
       queue: root.executors
   spec:
     schedulerName: yunikorn
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to