Thanks Jules ! Appreciate it a lot indeed !
On Mon, Mar 19, 2018 at 7:16 PM, Jules Damji wrote:
> What’s your PySpark function? Is it a UDF? If so consider using pandas UDF
> introduced in Spark 2.3.
>
> More info here: https://databricks.com/blog/2017/10/30/introducing-
What’s your PySpark function? Is it a UDF? If so consider using pandas UDF
introduced in Spark 2.3.
More info here:
https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html
Sent from my iPhone
Pardon the dumb thumb typos :)
> On Mar 18, 2018, at 10:54 PM,
Hi,
My dataframe is having 2000 rows. For processing each row it
consider 3 seconds and so sequentially it takes 2000 * 3 = 6000 seconds ,
which is a very high time.
Further, I am contemplating to run the function in parallel.
For example, I would like to divide the