subject:"Calling Pyspark functions in parallel"

Re: Calling Pyspark functions in parallel

2018-03-19 Thread Debabrata Ghosh

Thanks Jules ! Appreciate it a lot indeed ! On Mon, Mar 19, 2018 at 7:16 PM, Jules Damji wrote: > What’s your PySpark function? Is it a UDF? If so consider using pandas UDF > introduced in Spark 2.3. > > More info here: https://databricks.com/blog/2017/10/30/introducing-

Re: Calling Pyspark functions in parallel

2018-03-19 Thread Jules Damji

What’s your PySpark function? Is it a UDF? If so consider using pandas UDF introduced in Spark 2.3. More info here: https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html Sent from my iPhone Pardon the dumb thumb typos :) > On Mar 18, 2018, at 10:54 PM,

Calling Pyspark functions in parallel

2018-03-18 Thread Debabrata Ghosh

Hi, My dataframe is having 2000 rows. For processing each row it consider 3 seconds and so sequentially it takes 2000 * 3 = 6000 seconds , which is a very high time. Further, I am contemplating to run the function in parallel. For example, I would like to divide the