[
https://issues.apache.org/jira/browse/SPARK-49847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-49847:
---------------------------------
Priority: Critical (was: Major)
> PySpark compatibility with Spark Connect
> ----------------------------------------
>
> Key: SPARK-49847
> URL: https://issues.apache.org/jira/browse/SPARK-49847
> Project: Spark
> Issue Type: Umbrella
> Components: Connect, PySpark
> Affects Versions: 4.0.0
> Reporter: Haejoon Lee
> Assignee: Haejoon Lee
> Priority: Critical
>
> This aims to ensure full compatibility between PySpark and Spark Connect by
> thoroughly testing and validating that all functionalities in PySpark work
> seamlessly with Spark Connect.
> The [initial work|https://github.com/apache/spark/pull/48085] includes the
> creation of the *{{test_connect_compatibility.py}}* test suite, which
> validates the signature compatibility for core components such as
> {*}DataFrame{*}, {*}Column{*}, and *SparkSession* APIs. This test suite also
> includes checks for missing APIs and properties that need to be supported by
> Spark Connect.
> Key goals for this project:
> * Ensure that all PySpark APIs are fully functional in Spark Connect.
> * Identify discrepancies in API signatures between PySpark and Spark Connect.
> * Verify missing APIs and properties, and add necessary functionality to
> Spark Connect.
> * Create comprehensive tests to prevent regressions and ensure long-term
> compatibility.
> Further work will involve extending the test coverage to all critical PySpark
> modules and ensuring compatibility with Spark Connect in future releases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]