[
https://issues.apache.org/jira/browse/SPARK-30310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-30310:
----------------------------------
Fix Version/s: 2.4.5
> SparkUncaughtExceptionHandler halts running process unexpectedly
> ----------------------------------------------------------------
>
> Key: SPARK-30310
> URL: https://issues.apache.org/jira/browse/SPARK-30310
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.0, 3.0.0
> Reporter: Tin Hang To
> Assignee: Tin Hang To
> Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> During 2.4.x testing, we have many occasions where the Worker process would
> just DEAD unexpectedly, with the Worker log ends with:
>
> {{ERROR SparkUncaughtExceptionHandler: scala.MatchError: <...callstack...>}}
>
> We get the same callstack during our 2.3.x testing but the Worker process
> stays up.
> Upon looking at the 2.4.x SparkUncaughtExceptionHandler.scala compared to the
> 2.3.x version, we found out SPARK-24294 introduced the following change:
> {{exception catch {}}
> {{ case _: OutOfMemoryError =>}}
> {{ System.exit(SparkExitCode.OOM)}}
> {{ case e: SparkFatalException if e.throwable.isInstanceOf[OutOfMemoryError]
> =>}}
> {{ // SPARK-24294: This is defensive code, in case that
> SparkFatalException is}}
> {{ // misused and uncaught.}}
> {{ System.exit(SparkExitCode.OOM)}}
> {{ case _ if exitOnUncaughtException =>}}
> {{ System.exit(SparkExitCode.UNCAUGHT_EXCEPTION)}}
> {{}}}
>
> This code has the _ if exitOnUncaughtException case, but not the other _
> cases. As a result, when exitOnUncaughtException is false (Master and
> Worker) and exception doesn't match any of the match cases (e.g.,
> IllegalStateException), Scala throws MatchError(exception) ("MatchError"
> wrapper of the original exception). Then the other catch block down below
> thinks we have another uncaught exception, and halts the entire process with
> SparkExitCode.UNCAUGHT_EXCEPTION_TWICE.
>
> {{catch {}}
> {{ case oom: OutOfMemoryError => Runtime.getRuntime.halt(SparkExitCode.OOM)}}
> {{ case t: Throwable =>
> Runtime.getRuntime.halt(SparkExitCode.UNCAUGHT_EXCEPTION_TWICE)}}
> {{}}}
>
> Therefore, even when exitOnUncaughtException is false, the process will halt.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]