Shixiong Zhu created SPARK-33587:
------------------------------------
Summary: Kill the executor on nested fatal errors
Key: SPARK-33587
URL: https://issues.apache.org/jira/browse/SPARK-33587
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.0.1
Reporter: Shixiong Zhu
Currently we kill the executor when hitting a fatal error. However, if the
fatal error is wrapped by another exception, such as
- java.util.concurrent.ExecutionException,
com.google.common.util.concurrent.UncheckedExecutionException,
com.google.common.util.concurrent.ExecutionError when using Guava cache and
java thread pool.
- SparkException thrown from this line:
https://github.com/apache/spark/blob/cf98a761de677c733f3c33230e1c63ddb785d5c5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L231
We will still keep the executor running. Fatal errors are usually unrecoverable
(such as OutOfMemoryError), some components may be in a broken state when
hitting a fatal error. Hence, it's better to detect the nested fatal error as
well and kill the executor. Then we can rely on Spark's fault tolerance to
recover.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]