This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new fc69194fc032 [SPARK-50310][CONNECT][PYTHON] Call
`with_origin_to_class` when the `Column` initializing
fc69194fc032 is described below
commit fc69194fc03212035f4b42701dcbc409e5a36b03
Author: Haejoon Lee <[email protected]>
AuthorDate: Fri Dec 6 09:37:25 2024 +0900
[SPARK-50310][CONNECT][PYTHON] Call `with_origin_to_class` when the
`Column` initializing
### What changes were proposed in this pull request?
This PR followups https://github.com/apache/spark/pull/48964 to fix
`with_origin_to_class` to be called when the `Column` actually initializing
### Why are the changes needed?
We don't need call `with_origin_to_class` if the Column API is not used.
Furthermore, the current way is not working properly from Spark Connect as
it causes circular import error as mentioned from
https://github.com/apache/spark/pull/48964#discussion_r1868569335 so this is
more safer way to support `with_origin_to_class`.
### Does this PR introduce _any_ user-facing change?
No API changes, but the Spark Connect Python client no longer issues
circular import error when initializing.
### How was this patch tested?
Manually tested
**Before (Failed initializing Spark Connect Python client)**
```
% ./bin/pyspark --remote local
Python 3.9.17 (main, Jul 5 2023, 15:35:09)
[Clang 14.0.6 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
...
ImportError: cannot import name 'SparkSession' from partially initialized
module 'pyspark.sql.connect.session' (most likely due to a circular import)
(/Users/haejoon.lee/Desktop/git_repos/spark/python/pyspark/sql/connect/session.py)
```
**After (Successfully initializing Spark Connect Python client)**
```
% ./bin/pyspark --remote local
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 4.0.0.dev0
/_/
Using Python version 3.9.17 (main, Jul 5 2023 15:35:09)
Client connected to the Spark Connect server at localhost
SparkSession available as 'spark'.
>>> spark
<pyspark.sql.connect.session.SparkSession object at 0x105c6e850>
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #49054 from itholic/SPARK-50310-connect.
Authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/sql/classic/column.py | 10 ++++++++--
python/pyspark/sql/connect/column.py | 10 ++++++++--
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/python/pyspark/sql/classic/column.py
b/python/pyspark/sql/classic/column.py
index c08eac7f6a04..05fcb2162822 100644
--- a/python/pyspark/sql/classic/column.py
+++ b/python/pyspark/sql/classic/column.py
@@ -33,7 +33,6 @@ from typing import (
from pyspark.sql.column import Column as ParentColumn
from pyspark.errors import PySparkAttributeError, PySparkTypeError,
PySparkValueError
-from pyspark.errors.utils import with_origin_to_class
from pyspark.sql.types import DataType
from pyspark.sql.utils import get_active_spark_context, enum_to_value
@@ -175,12 +174,19 @@ def _reverse_op(
return Column(jc)
-@with_origin_to_class
class Column(ParentColumn):
def __new__(
cls,
jc: "JavaObject",
) -> "Column":
+ # We apply `with_origin_to_class` decorator here instead of top of the
class definition
+ # to prevent circular import issue when initializing the SparkSession.
+ # See https://github.com/apache/spark/pull/49054 for more detail.
+ from pyspark.errors.utils import with_origin_to_class
+
+ if not hasattr(cls, "_with_origin_applied"):
+ cls = with_origin_to_class(cls)
+ cls._with_origin_applied = True
self = object.__new__(cls)
self.__init__(jc) # type: ignore[misc]
return self
diff --git a/python/pyspark/sql/connect/column.py
b/python/pyspark/sql/connect/column.py
index e84008114634..1440c4c2792b 100644
--- a/python/pyspark/sql/connect/column.py
+++ b/python/pyspark/sql/connect/column.py
@@ -52,7 +52,6 @@ from pyspark.sql.connect.expressions import (
WithField,
DropField,
)
-from pyspark.errors.utils import with_origin_to_class
if TYPE_CHECKING:
@@ -107,12 +106,19 @@ def _to_expr(v: Any) -> Expression:
return v._expr if isinstance(v, Column) else
LiteralExpression._from_value(v)
-@with_origin_to_class(["to_plan"])
class Column(ParentColumn):
def __new__(
cls,
expr: "Expression",
) -> "Column":
+ # We apply `with_origin_to_class` decorator here instead of top of the
class definition
+ # to prevent circular import issue when initializing the SparkSession.
+ # See https://github.com/apache/spark/pull/49054 for more detail.
+ from pyspark.errors.utils import with_origin_to_class
+
+ if not hasattr(cls, "_with_origin_applied"):
+ cls = with_origin_to_class(["to_plan"])(cls)
+ cls._with_origin_applied = True
self = object.__new__(cls)
self.__init__(expr) # type: ignore[misc]
return self
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]