Haejoon Lee created SPARK-49847:
-----------------------------------
Summary: PySpark compatibility with Spark Connect
Key: SPARK-49847
URL: https://issues.apache.org/jira/browse/SPARK-49847
Project: Spark
Issue Type: Umbrella
Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Haejoon Lee
This aims to ensure full compatibility between PySpark and Spark Connect by
thoroughly testing and validating that all functionalities in PySpark work
seamlessly with Spark Connect.
The initial work includes the creation of the
*{{test_connect_compatibility.py}}* test suite, which validates the signature
compatibility for core components such as {*}DataFrame{*}, {*}Column{*}, and
*SparkSession* APIs. This test suite also includes checks for missing APIs and
properties that need to be supported by Spark Connect.
Key goals for this project:
* Ensure that all PySpark APIs are fully functional in Spark Connect.
* Identify discrepancies in API signatures between PySpark and Spark Connect.
* Verify missing APIs and properties, and add necessary functionality to Spark
Connect.
* Create comprehensive tests to prevent regressions and ensure long-term
compatibility.
Further work will involve extending the test coverage to all critical PySpark
modules and ensuring compatibility with Spark Connect in future releases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]