Hi, I would like to share experience on spark 3.4.1 running on k8s autopilot or some refer to it as serverless.
My current experience is on Google GKE autopilot <https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview>. So essentially you specify the name and region and CSP takes care of the rest. FYI, I am running Java 11 and spark 3.4.1 on the host submitting spark-submit. The docker file is also built on java 11, Spark 3.4,1 and Pyspark The tag explains it spark-py:3.4.1-scala_2.12-11-jre-slim-buster-java11PlusPackages The problem I notice is that cluster starts with e2-medium node which is 4GB NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS spark-on-gke europe-west2 1.27.2-gke.1200 34.147.184.xxx e2-medium 1.27.2-gke.1200 1 RUNNING Meaning that the driver starts with that configuration and it takes at times more than three minutes for the driver to go into RUNNING state. In contrast, this does not have such problems with spark 3.1.1 and Java 8 both at the host and the docker file. Any reason why this is happening, taking into account that Java 11 and Spark 3.4.1 consume more resources. Essentially is autopilot a good fit for spark? The spark-submit is shown below spark-submit --verbose \ --properties-file ${property_file} \ --master k8s://https://$KUBERNETES_MASTER_IP:443 \ --deploy-mode cluster \ --name $APPNAME \ --py-files $CODE_DIRECTORY_CLOUD/spark_on_gke.zip \ --conf spark.kubernetes.namespace=$NAMESPACE \ --conf spark.network.timeout=300 \ --conf spark.kubernetes.allocation.batch.size=3 \ --conf spark.kubernetes.allocation.batch.delay=1 \ --conf spark.kubernetes.driver.container.image=${IMAGEDRIVER} \ --conf spark.kubernetes.executor.container.image=${IMAGEDRIVER} \ --conf spark.kubernetes.driver.pod.name=$APPNAME \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \ --conf spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ --conf spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.dynamicAllocation.shuffleTracking.enabled=true \ --conf spark.dynamicAllocation.shuffleTracking.timeout=20s \ --conf spark.dynamicAllocation.executorIdleTimeout=30s \ --conf spark.dynamicAllocation.cachedExecutorIdleTimeout=40s \ --conf spark.dynamicAllocation.minExecutors=0 \ --conf spark.dynamicAllocation.maxExecutors=20 \ $CODE_DIRECTORY_CLOUD/${APPLICATION} Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.