This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new bfbd3df69956 [MINOR][PYTHON] Better error message when Python worker
crushes
bfbd3df69956 is described below
commit bfbd3df699560b901c71ffef5912ace97106bca3
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Mon Nov 13 17:40:22 2023 +0900
[MINOR][PYTHON] Better error message when Python worker crushes
### What changes were proposed in this pull request?
This PR improves the Python UDF error messages to be more actionable.
### Why are the changes needed?
Suppose you face a segfault error:
```python
from pyspark.sql.functions import udf
import ctypes
spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()
```
The current error message is not actionable:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling
o82.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task
15 in stage 1.0 failed 1 times, most recent failure: Lost task 15.0 in stage
1.0 (TID 31) (192.168.123.102 executor driver): org.apache.spark.SparkException:
Python worker exited unexpectedly (crashed)
```
After this PR, it fixes the error message as below:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling
o59.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task
15 in stage 0.0 failed 1 times, most recent failure: Lost task 15.0 in stage
0.0 (TID 15) (192.168.123.102 executor driver): org.apache.spark.SparkException:
Python worker exited unexpectedly (crashed). Consider setting
'spark.sql.execution.pyspark.udf.faulthandler.enabled'
or 'spark.python.worker.faulthandler.enabled' configuration to 'true'
forthe better Python traceback.
```
So you can try this out
```python
from pyspark.sql.functions import udf
import ctypes
spark.conf.set("spark.sql.execution.pyspark.udf.faulthandler.enabled",
"true")
spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()
```
that now shows where the segfault happens:
```
Caused by: org.apache.spark.SparkException: Python worker exited
unexpectedly (crashed): Fatal Python error: Segmentation fault
Current thread 0x00007ff84ae4b700 (most recent call first):
File "/.../envs/python3.9/lib/python3.9/ctypes/__init__.py", line 525 in
string_at
File "<stdin>", line 1 in <lambda>
File "/.../lib/pyspark.zip/pyspark/util.py", line 88 in wrapper
File "/.../lib/pyspark.zip/pyspark/worker.py", line 99 in <lambda>
File "/.../lib/pyspark.zip/pyspark/worker.py", line 1403 in <genexpr>
File "/.../lib/pyspark.zip/pyspark/worker.py", line 1403 in mapper
```
### Does this PR introduce _any_ user-facing change?
Yes, it fixes the error message actionable.
### How was this patch tested?
Manually tested as above.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #43778 from HyukjinKwon/minor-error-improvement.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
.../main/scala/org/apache/spark/api/python/PythonRunner.scala | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
index 1a01ad1bc219..d6363182606d 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
@@ -31,7 +31,7 @@ import scala.util.control.NonFatal
import org.apache.spark._
import org.apache.spark.internal.Logging
-import org.apache.spark.internal.config.{BUFFER_SIZE, EXECUTOR_CORES}
+import org.apache.spark.internal.config.{BUFFER_SIZE, EXECUTOR_CORES, Python}
import org.apache.spark.internal.config.Python._
import org.apache.spark.rdd.InputFileBlockHolder
import
org.apache.spark.resource.ResourceProfile.{EXECUTOR_CORES_LOCAL_PROPERTY,
PYSPARK_MEMORY_LOCAL_PROPERTY}
@@ -549,6 +549,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
JavaFiles.deleteIfExists(path)
throw new SparkException(s"Python worker exited unexpectedly
(crashed): $error", e)
+ case eof: EOFException if !faultHandlerEnabled =>
+ throw new SparkException(
+ s"Python worker exited unexpectedly (crashed). " +
+ "Consider setting
'spark.sql.execution.pyspark.udf.faulthandler.enabled' or" +
+ s"'${Python.PYTHON_WORKER_FAULTHANLDER_ENABLED.key}' configuration
to 'true' for" +
+ "the better Python traceback.", eof)
+
case eof: EOFException =>
throw new SparkException("Python worker exited unexpectedly
(crashed)", eof)
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]