Haejoon Lee created SPARK-49847:
-----------------------------------

             Summary: PySpark compatibility with Spark Connect
                 Key: SPARK-49847
                 URL: https://issues.apache.org/jira/browse/SPARK-49847
             Project: Spark
          Issue Type: Umbrella
          Components: Connect, PySpark
    Affects Versions: 4.0.0
            Reporter: Haejoon Lee


This aims to ensure full compatibility between PySpark and Spark Connect by 
thoroughly testing and validating that all functionalities in PySpark work 
seamlessly with Spark Connect.

The initial work includes the creation of the 
*{{test_connect_compatibility.py}}* test suite, which validates the signature 
compatibility for core components such as {*}DataFrame{*}, {*}Column{*}, and 
*SparkSession* APIs. This test suite also includes checks for missing APIs and 
properties that need to be supported by Spark Connect.

Key goals for this project:
 * Ensure that all PySpark APIs are fully functional in Spark Connect.
 * Identify discrepancies in API signatures between PySpark and Spark Connect.
 * Verify missing APIs and properties, and add necessary functionality to Spark 
Connect.
 * Create comprehensive tests to prevent regressions and ensure long-term 
compatibility.

Further work will involve extending the test coverage to all critical PySpark 
modules and ensuring compatibility with Spark Connect in future releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to