Re: pyspark loop optimization

2022-01-12 Thread Ramesh Natarajan
ect of your loops on the explain plan > - that should give some details. > > > Regards, > Gourav Sengupta > >> On Mon, Jan 10, 2022 at 10:49 PM Ramesh Natarajan wrote: >> I want to compute cume_dist on a bunch of columns in a spark dataframe, but >> want to re

pyspark loop optimization

2022-01-10 Thread Ramesh Natarajan
I want to compute cume_dist on a bunch of columns in a spark dataframe, but want to remove NULL values before doing so. I have this loop in pyspark. While this works, I see the driver runs at 100% while the executors are idle for the most part. I am reading that running a loop is an anti-pattern