This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 88cc1530a166 [SPARK-48650][PYTHON] Display correct call site from 
IPython Notebook
88cc1530a166 is described below

commit 88cc1530a166742a1be9ae7287263277bd1ec0f7
Author: Haejoon Lee <[email protected]>
AuthorDate: Mon Jun 24 14:40:50 2024 +0900

    [SPARK-48650][PYTHON] Display correct call site from IPython Notebook
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to display correct call site information from IPython 
Notebook.
    
    ### Why are the changes needed?
    
    We added `DataFrameQueryContext` for PySpark error message from 
https://github.com/apache/spark/pull/45377, but it does not working very well 
from IPython Notebook.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No API changes, but the user-facing error message from IPython Notebook 
will be improved:
    
    **Before**
    <img width="1124" alt="Screenshot 2024-06-18 at 5 15 56 PM" 
src="https://github.com/apache/spark/assets/44108233/3e3aee2c-5bb0-4858-b392-e845b7280d31";>
    
    **After**
    <img width="1163" alt="Screenshot 2024-06-19 at 8 45 05 AM" 
src="https://github.com/apache/spark/assets/44108233/81741d15-cac9-41e7-815a-5d84f1176c73";>
    
    **NOTE:** This also works when command is executed across multiple cells:
    
    <img width="1175" alt="Screenshot 2024-06-19 at 8 42 29 AM" 
src="https://github.com/apache/spark/assets/44108233/d65fbf79-d621-4ae0-b220-2f7923cc3666";>
    
    ### How was this patch tested?
    
    Manually tested with IPython Notebook.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #47009 from itholic/error_context_on_notebook.
    
    Authored-by: Haejoon Lee <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/errors/utils.py | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/errors/utils.py b/python/pyspark/errors/utils.py
index cd3046380284..9155bfb54abe 100644
--- a/python/pyspark/errors/utils.py
+++ b/python/pyspark/errors/utils.py
@@ -21,6 +21,7 @@ import inspect
 import os
 import threading
 from typing import Any, Callable, Dict, Match, TypeVar, Type, Optional, 
TYPE_CHECKING
+import pyspark
 from pyspark.errors.error_classes import ERROR_CLASSES_MAP
 
 if TYPE_CHECKING:
@@ -164,9 +165,29 @@ def _capture_call_site(spark_session: "SparkSession", 
depth: int) -> str:
     The call site information is used to enhance error messages with the exact 
location
     in the user code that led to the error.
     """
-    stack = list(reversed(inspect.stack()))
+    # Filtering out PySpark code and keeping user code only
+    pyspark_root = os.path.dirname(pyspark.__file__)
+    stack = [
+        frame_info for frame_info in inspect.stack() if pyspark_root not in 
frame_info.filename
+    ]
+
     selected_frames = stack[:depth]
-    call_sites = [f"{frame.filename}:{frame.lineno}" for frame in 
selected_frames]
+
+    # We try import here since IPython is not a required dependency
+    try:
+        from IPython import get_ipython
+
+        ipython = get_ipython()
+    except ImportError:
+        ipython = None
+
+    # Identifying the cell is useful when the error is generated from IPython 
Notebook
+    if ipython:
+        call_sites = [
+            f"line {frame.lineno} in cell [{ipython.execution_count}]" for 
frame in selected_frames
+        ]
+    else:
+        call_sites = [f"{frame.filename}:{frame.lineno}" for frame in 
selected_frames]
     call_sites_str = "\n".join(call_sites)
 
     return call_sites_str


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to