Hi,

I would like to share experience on spark 3.4.1 running on k8s autopilot or
some refer to it as serverless.

My current experience is on Google GKE autopilot
<https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview>.
So essentially you specify the name and region and CSP takes care of the
rest. FYI, I am running Java 11 and spark 3.4.1 on the host submitting
spark-submit. The docker file is also built on java 11, Spark 3.4,1 and
Pyspark

The tag explains it

spark-py:3.4.1-scala_2.12-11-jre-slim-buster-java11PlusPackages

The problem I notice is that cluster starts with e2-medium node which is 4GB

NAME          LOCATION      MASTER_VERSION   MASTER_IP       MACHINE_TYPE
NODE_VERSION     NUM_NODES  STATUS
spark-on-gke  europe-west2  1.27.2-gke.1200  34.147.184.xxx  e2-medium
 1.27.2-gke.1200  1          RUNNING

Meaning that the driver starts with that configuration and it takes at
times more than three minutes for the driver to go into RUNNING state. In
contrast, this does not have such problems with spark 3.1.1 and Java 8 both
at the host and the docker file. Any reason why this is happening, taking
into account that Java 11 and Spark 3.4.1 consume more resources.
Essentially is autopilot a good fit for spark?

The spark-submit is shown below

        spark-submit --verbose \
           --properties-file ${property_file} \
           --master k8s://https://$KUBERNETES_MASTER_IP:443 \
           --deploy-mode cluster \
           --name $APPNAME \
           --py-files $CODE_DIRECTORY_CLOUD/spark_on_gke.zip \
           --conf spark.kubernetes.namespace=$NAMESPACE \
           --conf spark.network.timeout=300 \
           --conf spark.kubernetes.allocation.batch.size=3 \
           --conf spark.kubernetes.allocation.batch.delay=1 \
           --conf spark.kubernetes.driver.container.image=${IMAGEDRIVER} \
           --conf spark.kubernetes.executor.container.image=${IMAGEDRIVER} \
           --conf spark.kubernetes.driver.pod.name=$APPNAME \
           --conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \
           --conf
spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \
           --conf
spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
\
           --conf spark.dynamicAllocation.enabled=true \
           --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
           --conf spark.dynamicAllocation.shuffleTracking.timeout=20s \
           --conf spark.dynamicAllocation.executorIdleTimeout=30s \
           --conf spark.dynamicAllocation.cachedExecutorIdleTimeout=40s \
           --conf spark.dynamicAllocation.minExecutors=0 \
           --conf spark.dynamicAllocation.maxExecutors=20 \
           $CODE_DIRECTORY_CLOUD/${APPLICATION}



Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to