(spark) branch master updated: [SPARK-50310][CONNECT][PYTHON][FOLLOW-UP] Delay is_debugging_enabled call after modules are initialized

gurwls223 Thu, 26 Dec 2024 22:08:16 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new a0539bf440fb [SPARK-50310][CONNECT][PYTHON][FOLLOW-UP] Delay 
is_debugging_enabled call after modules are initialized
a0539bf440fb is described below

commit a0539bf440fb645d14d955d386e6df2413e08d86
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Fri Dec 27 15:07:56 2024 +0900

    [SPARK-50310][CONNECT][PYTHON][FOLLOW-UP] Delay is_debugging_enabled call 
after modules are initialized
    
    ### What changes were proposed in this pull request?
    
    This PR is a retry of https://github.com/apache/spark/pull/49054 that 
avoids hacky monkey patch.
    
    ### Why are the changes needed?
    
    - This disables DataFrameQueryContext for `pyspark.sql.functions` too
    - It avoids circular import in pyspark-connect package.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, after this followup, `spark.python.sql.dataFrameDebugging.enabled` 
also works with `pyspark.sql.functions.*`.
    
    ### How was this patch tested?
    
    Manually ran profilers:
    
    ```python
    import cProfile
    
    from pyspark.sql.functions import col
    
    def foo():
        for _ in range(1000):
            col("id")
    
    cProfile.run('foo()', sort='tottime')
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #49311 from HyukjinKwon/SPARK-50310-followup.
    
    Authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/errors/utils.py | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/errors/utils.py b/python/pyspark/errors/utils.py
index 416a2323b170..f9f60637bd57 100644
--- a/python/pyspark/errors/utils.py
+++ b/python/pyspark/errors/utils.py
@@ -255,7 +255,8 @@ def _with_origin(func: FuncT) -> FuncT:
         from pyspark.sql.utils import is_remote
 
         spark = SparkSession.getActiveSession()
-        if spark is not None and hasattr(func, "__name__"):
+
+        if spark is not None and hasattr(func, "__name__") and 
is_debugging_enabled():
             if is_remote():
                 global current_origin
 
@@ -313,10 +314,7 @@ def with_origin_to_class(
         return lambda cls: with_origin_to_class(cls, ignores)
     else:
         cls = cls_or_ignores
-        if (
-            os.environ.get("PYSPARK_PIN_THREAD", "true").lower() == "true"
-            and is_debugging_enabled()
-        ):
+        if os.environ.get("PYSPARK_PIN_THREAD", "true").lower() == "true":
             skipping = set(
                 ["__init__", "__new__", "__iter__", "__nonzero__", "__repr__", 
"__bool__"]
                 + (ignores or [])


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-50310][CONNECT][PYTHON][FOLLOW-UP] Delay is_debugging_enabled call after modules are initialized

Reply via email to