[
https://issues.apache.org/jira/browse/SPARK-49847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haejoon Lee updated SPARK-49847:
--------------------------------
Description:
This aims to ensure full compatibility between PySpark and Spark Connect by
thoroughly implementing and testing that all functionalities in PySpark work
seamlessly with Spark Connect.
The [initial work|https://github.com/apache/spark/pull/48085] includes the
creation of the *{{test_connect_compatibility.py}}* test suite, which validates
the signature compatibility for core components such as {*}DataFrame{*},
{*}Column{*}, and *SparkSession* APIs. This test suite also includes checks for
missing APIs and properties that need to be supported by Spark Connect.
Key goals for this project:
* Ensure that all PySpark APIs are fully functional in Spark Connect.
* Identify discrepancies in API signatures between PySpark and Spark Connect.
* Verify missing APIs and properties, and add necessary functionality to Spark
Connect.
* Create comprehensive tests to prevent regressions and ensure long-term
compatibility.
Further work will involve extending the test coverage to all critical PySpark
modules and ensuring compatibility with Spark Connect in future releases.
was:
This aims to ensure full compatibility between PySpark and Spark Connect by
thoroughly testing and validating that all functionalities in PySpark work
seamlessly with Spark Connect.
The [initial work|https://github.com/apache/spark/pull/48085] includes the
creation of the *{{test_connect_compatibility.py}}* test suite, which validates
the signature compatibility for core components such as {*}DataFrame{*},
{*}Column{*}, and *SparkSession* APIs. This test suite also includes checks for
missing APIs and properties that need to be supported by Spark Connect.
Key goals for this project:
* Ensure that all PySpark APIs are fully functional in Spark Connect.
* Identify discrepancies in API signatures between PySpark and Spark Connect.
* Verify missing APIs and properties, and add necessary functionality to Spark
Connect.
* Create comprehensive tests to prevent regressions and ensure long-term
compatibility.
Further work will involve extending the test coverage to all critical PySpark
modules and ensuring compatibility with Spark Connect in future releases.
> PySpark compatibility with Spark Connect
> ----------------------------------------
>
> Key: SPARK-49847
> URL: https://issues.apache.org/jira/browse/SPARK-49847
> Project: Spark
> Issue Type: Umbrella
> Components: Connect, PySpark
> Affects Versions: 4.0.0
> Reporter: Haejoon Lee
> Assignee: Haejoon Lee
> Priority: Critical
>
> This aims to ensure full compatibility between PySpark and Spark Connect by
> thoroughly implementing and testing that all functionalities in PySpark work
> seamlessly with Spark Connect.
> The [initial work|https://github.com/apache/spark/pull/48085] includes the
> creation of the *{{test_connect_compatibility.py}}* test suite, which
> validates the signature compatibility for core components such as
> {*}DataFrame{*}, {*}Column{*}, and *SparkSession* APIs. This test suite also
> includes checks for missing APIs and properties that need to be supported by
> Spark Connect.
> Key goals for this project:
> * Ensure that all PySpark APIs are fully functional in Spark Connect.
> * Identify discrepancies in API signatures between PySpark and Spark Connect.
> * Verify missing APIs and properties, and add necessary functionality to
> Spark Connect.
> * Create comprehensive tests to prevent regressions and ensure long-term
> compatibility.
> Further work will involve extending the test coverage to all critical PySpark
> modules and ensuring compatibility with Spark Connect in future releases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]