[
https://issues.apache.org/jira/browse/SPARK-40307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xinrong Meng updated SPARK-40307:
---------------------------------
Description:
Python user-defined function (UDF) enables users to run arbitrary code against
PySpark columns. It uses Pickle for (de)serialization and executes row by row.
One major performance bottleneck of Python UDFs is (de)serialization, that is,
the data interchanging between the worker JVM and the spawned Python subprocess
which actually executes the UDF. We should seek an alternative to handle the
(de)serialization: Arrow, which is used in the (de)serialization of Pandas UDF
already.
was:
Python user-defined function (UDF) enables users to run arbitrary code against
PySpark columns. It uses Pickle for (de)serialization, and executes row by row.
One major performance bottleneck of Python UDFs is (de)serialization, that is,
the data interchanging between the worker JVM and the spawned Python subprocess
which actually executes the UDF. We should seek for an alternative to handle
the (de)serialization: Arrow, which is used in (de)serialization of Pandas UDF
already.
> Optimize (De)Serialization of Python UDFs by Arrow
> --------------------------------------------------
>
> Key: SPARK-40307
> URL: https://issues.apache.org/jira/browse/SPARK-40307
> Project: Spark
> Issue Type: Umbrella
> Components: PySpark
> Affects Versions: 3.4.0
> Reporter: Xinrong Meng
> Priority: Major
>
> Python user-defined function (UDF) enables users to run arbitrary code
> against PySpark columns. It uses Pickle for (de)serialization and executes
> row by row.
> One major performance bottleneck of Python UDFs is (de)serialization, that
> is, the data interchanging between the worker JVM and the spawned Python
> subprocess which actually executes the UDF. We should seek an alternative to
> handle the (de)serialization: Arrow, which is used in the (de)serialization
> of Pandas UDF already.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]