This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 88cc1530a166 [SPARK-48650][PYTHON] Display correct call site from
IPython Notebook
88cc1530a166 is described below
commit 88cc1530a166742a1be9ae7287263277bd1ec0f7
Author: Haejoon Lee <[email protected]>
AuthorDate: Mon Jun 24 14:40:50 2024 +0900
[SPARK-48650][PYTHON] Display correct call site from IPython Notebook
### What changes were proposed in this pull request?
This PR proposes to display correct call site information from IPython
Notebook.
### Why are the changes needed?
We added `DataFrameQueryContext` for PySpark error message from
https://github.com/apache/spark/pull/45377, but it does not working very well
from IPython Notebook.
### Does this PR introduce _any_ user-facing change?
No API changes, but the user-facing error message from IPython Notebook
will be improved:
**Before**
<img width="1124" alt="Screenshot 2024-06-18 at 5 15 56 PM"
src="https://github.com/apache/spark/assets/44108233/3e3aee2c-5bb0-4858-b392-e845b7280d31">
**After**
<img width="1163" alt="Screenshot 2024-06-19 at 8 45 05 AM"
src="https://github.com/apache/spark/assets/44108233/81741d15-cac9-41e7-815a-5d84f1176c73">
**NOTE:** This also works when command is executed across multiple cells:
<img width="1175" alt="Screenshot 2024-06-19 at 8 42 29 AM"
src="https://github.com/apache/spark/assets/44108233/d65fbf79-d621-4ae0-b220-2f7923cc3666">
### How was this patch tested?
Manually tested with IPython Notebook.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #47009 from itholic/error_context_on_notebook.
Authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/errors/utils.py | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/python/pyspark/errors/utils.py b/python/pyspark/errors/utils.py
index cd3046380284..9155bfb54abe 100644
--- a/python/pyspark/errors/utils.py
+++ b/python/pyspark/errors/utils.py
@@ -21,6 +21,7 @@ import inspect
import os
import threading
from typing import Any, Callable, Dict, Match, TypeVar, Type, Optional,
TYPE_CHECKING
+import pyspark
from pyspark.errors.error_classes import ERROR_CLASSES_MAP
if TYPE_CHECKING:
@@ -164,9 +165,29 @@ def _capture_call_site(spark_session: "SparkSession",
depth: int) -> str:
The call site information is used to enhance error messages with the exact
location
in the user code that led to the error.
"""
- stack = list(reversed(inspect.stack()))
+ # Filtering out PySpark code and keeping user code only
+ pyspark_root = os.path.dirname(pyspark.__file__)
+ stack = [
+ frame_info for frame_info in inspect.stack() if pyspark_root not in
frame_info.filename
+ ]
+
selected_frames = stack[:depth]
- call_sites = [f"{frame.filename}:{frame.lineno}" for frame in
selected_frames]
+
+ # We try import here since IPython is not a required dependency
+ try:
+ from IPython import get_ipython
+
+ ipython = get_ipython()
+ except ImportError:
+ ipython = None
+
+ # Identifying the cell is useful when the error is generated from IPython
Notebook
+ if ipython:
+ call_sites = [
+ f"line {frame.lineno} in cell [{ipython.execution_count}]" for
frame in selected_frames
+ ]
+ else:
+ call_sites = [f"{frame.filename}:{frame.lineno}" for frame in
selected_frames]
call_sites_str = "\n".join(call_sites)
return call_sites_str
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]