Hi Prem, Which distribution of Spark are you using ? how long does it take to launch the job ? wrt Spark Shuffle, what is the approach you are using - storing shuffle data in MinIO or using host path ?
regds, Karan On Fri, Apr 11, 2025 at 4:58 AM Prem Sahoo <prem.re...@gmail.com> wrote: > Hello Team, > I have a peculiar case of Spark slowness. > I am using Minio as Object storage from where Spark reads & writes data. I > am using YARN as Master and executing a Spark job which takes ~5mins the > same job when run with Kubernetes as Master it takes ~8 mins . > > I checked the Spark DAG in both and observed the same no of jobs/stages > and tasks. I am using the same machines which are being used in YARN and > Kubernetes . > > one observation: when I have disabled Spark Dynamic allocation false and > assigned static allocation I can see the execution time in Kubernetes based > Spark job ~5.5 mins. > > May I ask the team what could be the reason that Spark job runs slow on > kubernetes and what can be done to make it faster ? > Note :- I am using Spark 3.2 in both. > >