[
https://issues.apache.org/jira/browse/SPARK-49847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haejoon Lee updated SPARK-49847:
--------------------------------
Description:
This aims to ensure full compatibility between PySpark and Spark Connect by
thoroughly testing and validating that all functionalities in PySpark work
seamlessly with Spark Connect.
The [initial work|https://github.com/apache/spark/pull/48085] includes the
creation of the *{{test_connect_compatibility.py}}* test suite, which validates
the signature compatibility for core components such as {*}DataFrame{*},
{*}Column{*}, and *SparkSession* APIs. This test suite also includes checks for
missing APIs and properties that need to be supported by Spark Connect.
Key goals for this project:
* Ensure that all PySpark APIs are fully functional in Spark Connect.
* Identify discrepancies in API signatures between PySpark and Spark Connect.
* Verify missing APIs and properties, and add necessary functionality to Spark
Connect.
* Create comprehensive tests to prevent regressions and ensure long-term
compatibility.
Further work will involve extending the test coverage to all critical PySpark
modules and ensuring compatibility with Spark Connect in future releases.
was:
This aims to ensure full compatibility between PySpark and Spark Connect by
thoroughly testing and validating that all functionalities in PySpark work
seamlessly with Spark Connect.
The initial work includes the creation of the
*{{test_connect_compatibility.py}}* test suite, which validates the signature
compatibility for core components such as {*}DataFrame{*}, {*}Column{*}, and
*SparkSession* APIs. This test suite also includes checks for missing APIs and
properties that need to be supported by Spark Connect.
Key goals for this project:
* Ensure that all PySpark APIs are fully functional in Spark Connect.
* Identify discrepancies in API signatures between PySpark and Spark Connect.
* Verify missing APIs and properties, and add necessary functionality to Spark
Connect.
* Create comprehensive tests to prevent regressions and ensure long-term
compatibility.
Further work will involve extending the test coverage to all critical PySpark
modules and ensuring compatibility with Spark Connect in future releases.
> PySpark compatibility with Spark Connect
> ----------------------------------------
>
> Key: SPARK-49847
> URL: https://issues.apache.org/jira/browse/SPARK-49847
> Project: Spark
> Issue Type: Umbrella
> Components: Connect, PySpark
> Affects Versions: 4.0.0
> Reporter: Haejoon Lee
> Priority: Major
>
> This aims to ensure full compatibility between PySpark and Spark Connect by
> thoroughly testing and validating that all functionalities in PySpark work
> seamlessly with Spark Connect.
> The [initial work|https://github.com/apache/spark/pull/48085] includes the
> creation of the *{{test_connect_compatibility.py}}* test suite, which
> validates the signature compatibility for core components such as
> {*}DataFrame{*}, {*}Column{*}, and *SparkSession* APIs. This test suite also
> includes checks for missing APIs and properties that need to be supported by
> Spark Connect.
> Key goals for this project:
> * Ensure that all PySpark APIs are fully functional in Spark Connect.
> * Identify discrepancies in API signatures between PySpark and Spark Connect.
> * Verify missing APIs and properties, and add necessary functionality to
> Spark Connect.
> * Create comprehensive tests to prevent regressions and ensure long-term
> compatibility.
> Further work will involve extending the test coverage to all critical PySpark
> modules and ensuring compatibility with Spark Connect in future releases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]