(spark) branch master updated: [SPARK-50310][CONNECT][PYTHON] Call `with_origin_to_class` when the `Column` initializing

gurwls223 Thu, 05 Dec 2024 16:37:47 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new fc69194fc032 [SPARK-50310][CONNECT][PYTHON] Call 
`with_origin_to_class` when the `Column` initializing
fc69194fc032 is described below

commit fc69194fc03212035f4b42701dcbc409e5a36b03
Author: Haejoon Lee <[email protected]>
AuthorDate: Fri Dec 6 09:37:25 2024 +0900

    [SPARK-50310][CONNECT][PYTHON] Call `with_origin_to_class` when the 
`Column` initializing
    
    ### What changes were proposed in this pull request?
    
    This PR followups https://github.com/apache/spark/pull/48964 to fix 
`with_origin_to_class` to be called when the `Column` actually initializing
    
    ### Why are the changes needed?
    
    We don't need call `with_origin_to_class` if the Column API is not used.
    
    Furthermore, the current way is not working properly from Spark Connect as 
it causes circular import error as mentioned from 
https://github.com/apache/spark/pull/48964#discussion_r1868569335 so this is 
more safer way to support `with_origin_to_class`.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No API changes, but the Spark Connect Python client no longer issues 
circular import error when initializing.
    
    ### How was this patch tested?
    
    Manually tested
    
    **Before (Failed initializing Spark Connect Python client)**
    ```
    % ./bin/pyspark --remote local
    Python 3.9.17 (main, Jul  5 2023, 15:35:09)
    [Clang 14.0.6 ] :: Anaconda, Inc. on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    ...
    ImportError: cannot import name 'SparkSession' from partially initialized 
module 'pyspark.sql.connect.session' (most likely due to a circular import) 
(/Users/haejoon.lee/Desktop/git_repos/spark/python/pyspark/sql/connect/session.py)
    ```
    
    **After (Successfully initializing Spark Connect Python client)**
    ```
    % ./bin/pyspark --remote local
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 4.0.0.dev0
          /_/
    
    Using Python version 3.9.17 (main, Jul  5 2023 15:35:09)
    Client connected to the Spark Connect server at localhost
    SparkSession available as 'spark'.
    >>> spark
    <pyspark.sql.connect.session.SparkSession object at 0x105c6e850>
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #49054 from itholic/SPARK-50310-connect.
    
    Authored-by: Haejoon Lee <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/sql/classic/column.py | 10 ++++++++--
 python/pyspark/sql/connect/column.py | 10 ++++++++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/sql/classic/column.py 
b/python/pyspark/sql/classic/column.py
index c08eac7f6a04..05fcb2162822 100644
--- a/python/pyspark/sql/classic/column.py
+++ b/python/pyspark/sql/classic/column.py
@@ -33,7 +33,6 @@ from typing import (
 
 from pyspark.sql.column import Column as ParentColumn
 from pyspark.errors import PySparkAttributeError, PySparkTypeError, 
PySparkValueError
-from pyspark.errors.utils import with_origin_to_class
 from pyspark.sql.types import DataType
 from pyspark.sql.utils import get_active_spark_context, enum_to_value
 
@@ -175,12 +174,19 @@ def _reverse_op(
     return Column(jc)
 
 
-@with_origin_to_class
 class Column(ParentColumn):
     def __new__(
         cls,
         jc: "JavaObject",
     ) -> "Column":
+        # We apply `with_origin_to_class` decorator here instead of top of the 
class definition
+        # to prevent circular import issue when initializing the SparkSession.
+        # See https://github.com/apache/spark/pull/49054 for more detail.
+        from pyspark.errors.utils import with_origin_to_class
+
+        if not hasattr(cls, "_with_origin_applied"):
+            cls = with_origin_to_class(cls)
+            cls._with_origin_applied = True
         self = object.__new__(cls)
         self.__init__(jc)  # type: ignore[misc]
         return self
diff --git a/python/pyspark/sql/connect/column.py 
b/python/pyspark/sql/connect/column.py
index e84008114634..1440c4c2792b 100644
--- a/python/pyspark/sql/connect/column.py
+++ b/python/pyspark/sql/connect/column.py
@@ -52,7 +52,6 @@ from pyspark.sql.connect.expressions import (
     WithField,
     DropField,
 )
-from pyspark.errors.utils import with_origin_to_class
 
 
 if TYPE_CHECKING:
@@ -107,12 +106,19 @@ def _to_expr(v: Any) -> Expression:
     return v._expr if isinstance(v, Column) else 
LiteralExpression._from_value(v)
 
 
-@with_origin_to_class(["to_plan"])
 class Column(ParentColumn):
     def __new__(
         cls,
         expr: "Expression",
     ) -> "Column":
+        # We apply `with_origin_to_class` decorator here instead of top of the 
class definition
+        # to prevent circular import issue when initializing the SparkSession.
+        # See https://github.com/apache/spark/pull/49054 for more detail.
+        from pyspark.errors.utils import with_origin_to_class
+
+        if not hasattr(cls, "_with_origin_applied"):
+            cls = with_origin_to_class(["to_plan"])(cls)
+            cls._with_origin_applied = True
         self = object.__new__(cls)
         self.__init__(expr)  # type: ignore[misc]
         return self


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-50310][CONNECT][PYTHON] Call `with_origin_to_class` when the `Column` initializing

Reply via email to