Hello Karan,
I am using Spark open source in kubernetes and Spark mapr bundle in YARN.

For launching job in both approach it takes same 10 secs .

For shuffle I am using local in both yarn and kubernetes.
Sent from my iPhone

On Apr 11, 2025, at 11:24 AM, karan alang <karan.al...@gmail.com> wrote:


Hi Prem,

Which distribution of Spark are you using ?
how long does it take to launch the job ?
wrt Spark Shuffle, what is the approach you are using - storing shuffle data in MinIO or using host path ?

regds,
Karan

On Fri, Apr 11, 2025 at 4:58 AM Prem Sahoo <prem.re...@gmail.com> wrote:
Hello Team,
I have a peculiar case of Spark slowness.
I am using Minio as Object storage from where Spark reads & writes data. I am using YARN as Master and executing a Spark job which takes ~5mins the same job when run with Kubernetes as Master it takes ~8 mins .

I checked the Spark DAG in both and observed the same no of jobs/stages and tasks. I am using the same machines which are being used in YARN and Kubernetes .  

one observation: when I have disabled Spark Dynamic allocation false and assigned static allocation I can see the execution time in Kubernetes based Spark job ~5.5 mins.

May I ask the team what could be the reason that Spark job runs slow on kubernetes and what can be done to make it faster ?
Note :- I am using Spark 3.2 in both.

Reply via email to