[
https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xinrong Meng updated SPARK-41661:
---------------------------------
Description:
See design doc
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].
User-defined Functions in Python consist of (pickled) Python UDFs and
(Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code
on top of the Apache Spark™ engine. Users only have to state "what to do";
PySpark, as a sandbox, encapsulates "how to do it".
Spark Connect Python Client (SCPC), as a client and server interface for
PySpark, will eventually (probably Spark 4.0) replace the legacy API of PySpark
in both OSS. Supporting PySpark UDFs is essential for Spark Connect to reach
parity with the PySpark legacy API.
was:
See design doc
[here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].
PySpark UDFs mainly consist of (pickled) Python UDFs and (Arrow-optimized)
Pandas UDFs.
> Support for User-defined Functions in Python
> --------------------------------------------
>
> Key: SPARK-41661
> URL: https://issues.apache.org/jira/browse/SPARK-41661
> Project: Spark
> Issue Type: Umbrella
> Components: Connect
> Affects Versions: 3.4.0
> Reporter: Martin Grund
> Assignee: Xinrong Meng
> Priority: Major
>
> See design doc
> [here|https://docs.google.com/document/d/e/2PACX-1vRXF8nTdjwH0LbYyp3b6Zt6STEKWsvfKSO7_s4foOB-3zJ2h4_06JF147hUPlADJxZ_X22RFxgZ-fRS/pub].
> User-defined Functions in Python consist of (pickled) Python UDFs and
> (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code
> on top of the Apache Spark™ engine. Users only have to state "what to do";
> PySpark, as a sandbox, encapsulates "how to do it".
> Spark Connect Python Client (SCPC), as a client and server interface for
> PySpark, will eventually (probably Spark 4.0) replace the legacy API of
> PySpark in both OSS. Supporting PySpark UDFs is essential for Spark Connect
> to reach parity with the PySpark legacy API.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]