zhongwei liu created SPARK-31173:
------------------------------------

             Summary: Spark Kubernetes add tolerations and nodeName support
                 Key: SPARK-31173
                 URL: https://issues.apache.org/jira/browse/SPARK-31173
             Project: Spark
          Issue Type: New Feature
          Components: Kubernetes
    Affects Versions: 3.1.0, 2.4.6
         Environment: Alibaba Cloud ACK with spark 
operator(v1beta2-1.1.0-2.4.5) and spark(2.4.5)
            Reporter: zhongwei liu


When you run spark on serverless kubernetes cluster(virtual-kubelet). you need 
to specific the nodeSelectors,tolerations even nodeName when you want to gain 
better scheduling performance. Currently spark doesn't support tolerations. If 
you want to use this feature, You must use admission controller webhook to 
decorate the pod. But the performance is extremely bad. Here is the benchmark. 

With webhook 

Batch Size: 500 Pod creation: about 7 Pods/s   All Pods running: 5min

Without webhook 

Batch Size: 500 Pod creation: more than 500 Pods/s All Pods running: 45s

Adding tolerations and nodeName in spark will bring great help when you want to 
run a large scale job on serverless kubernetes cluster.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to