[
https://issues.apache.org/jira/browse/SPARK-27548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-27548:
------------------------------------
Assignee: (was: Apache Spark)
> PySpark toLocalIterator does not raise errors from worker
> ---------------------------------------------------------
>
> Key: SPARK-27548
> URL: https://issues.apache.org/jira/browse/SPARK-27548
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.4.1
> Reporter: Bryan Cutler
> Priority: Major
>
> When using a PySpark RDD local iterator and an error occurs on the worker, it
> is not picked up by Py4J and raised in the Python driver process. So unless
> looking at logs, there is no way for the application to know the worker had
> an error. This is a test that should pass if the error is raised in the
> driver:
> {code}
> def test_to_local_iterator_error(self):
> def fail(_):
> raise RuntimeError("local iterator error")
> rdd = self.sc.parallelize(range(10)).map(fail)
> with self.assertRaisesRegexp(Exception, "local iterator error"):
> for _ in rdd.toLocalIterator():
> pass{code}
> but it does not raise an exception:
> {noformat}
> Caused by: org.apache.spark.api.python.PythonException: Traceback (most
> recent call last):
> File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/worker.py", line
> 428, in main
> process()
> File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/worker.py", line
> 423, in process
> serializer.dump_stream(func(split_index, iterator), outfile)
> File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/serializers.py",
> line 505, in dump_stream
> vs = list(itertools.islice(iterator, batch))
> File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/util.py", line
> 99, in wrapper
> return f(*args, **kwargs)
> File "/home/bryan/git/spark/python/pyspark/tests/test_rdd.py", line 742, in
> fail
> raise RuntimeError("local iterator error")
> RuntimeError: local iterator error
> at
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:453)
> ...
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> FAIL
> ======================================================================
> FAIL: test_to_local_iterator_error (pyspark.tests.test_rdd.RDDTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/home/bryan/git/spark/python/pyspark/tests/test_rdd.py", line 748, in
> test_to_local_iterator_error
> pass
> AssertionError: Exception not raised{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]