Hello Karan, I am using Spark open source in kubernetes and Spark mapr bundle in YARN.
For launching job in both approach it takes same 10 secs .
For shuffle I am using local in both yarn and kubernetes. Sent from my iPhone On Apr 11, 2025, at 11:24 AM, karan alang <karan.al...@gmail.com> wrote:
Hi Prem,
Which distribution of Spark are you using ? how long does it take to launch the job ? wrt Spark Shuffle, what is the approach you are using - storing shuffle data in MinIO or using host path ?
regds, Karan Hello Team, I have a peculiar case of Spark slowness. I am using Minio as Object storage from where Spark reads & writes data. I am using YARN as Master and executing a Spark job which takes ~5mins the same job when run with Kubernetes as Master it takes ~8 mins .
I checked the Spark DAG in both and observed the same no of jobs/stages and tasks. I am using the same machines which are being used in YARN and Kubernetes .
one observation: when I have disabled Spark Dynamic allocation false and assigned static allocation I can see the execution time in Kubernetes based Spark job ~5.5 mins.
May I ask the team what could be the reason that Spark job runs slow on kubernetes and what can be done to make it faster ? Note :- I am using Spark 3.2 in both.
|