[ 
https://issues.apache.org/jira/browse/SPARK-30310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-30310.
----------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 26955
[https://github.com/apache/spark/pull/26955]

> SparkUncaughtExceptionHandler halts running process unexpectedly
> ----------------------------------------------------------------
>
>                 Key: SPARK-30310
>                 URL: https://issues.apache.org/jira/browse/SPARK-30310
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0, 3.0.0
>            Reporter: Tin Hang To
>            Assignee: Tin Hang To
>            Priority: Major
>             Fix For: 3.0.0
>
>
> During 2.4.x testing, we have many occasions where the Worker process would 
> just DEAD unexpectedly, with the Worker log ends with:
>  
> {{ERROR SparkUncaughtExceptionHandler: scala.MatchError:  <...callstack...>}}
>  
> We get the same callstack during our 2.3.x testing but the Worker process 
> stays up.
> Upon looking at the 2.4.x SparkUncaughtExceptionHandler.scala compared to the 
> 2.3.x version,  we found out SPARK-24294 introduced the following change:
> {{exception catch {}}
> {{  case _: OutOfMemoryError =>}}
> {{    System.exit(SparkExitCode.OOM)}}
> {{  case e: SparkFatalException if e.throwable.isInstanceOf[OutOfMemoryError] 
> =>}}
> {{    // SPARK-24294: This is defensive code, in case that 
> SparkFatalException is}}
> {{    // misused and uncaught.}}
> {{    System.exit(SparkExitCode.OOM)}}
> {{  case _ if exitOnUncaughtException =>}}
> {{    System.exit(SparkExitCode.UNCAUGHT_EXCEPTION)}}
> {{}}}
>  
> This code has the _ if exitOnUncaughtException case, but not the other _ 
> cases.  As a result, when exitOnUncaughtException is false (Master and 
> Worker) and exception doesn't match any of the match cases (e.g., 
> IllegalStateException), Scala throws MatchError(exception) ("MatchError" 
> wrapper of the original exception).  Then the other catch block down below 
> thinks we have another uncaught exception, and halts the entire process with 
> SparkExitCode.UNCAUGHT_EXCEPTION_TWICE.
>  
> {{catch {}}
> {{  case oom: OutOfMemoryError => Runtime.getRuntime.halt(SparkExitCode.OOM)}}
> {{  case t: Throwable => 
> Runtime.getRuntime.halt(SparkExitCode.UNCAUGHT_EXCEPTION_TWICE)}}
> {{}}}
>  
> Therefore, even when exitOnUncaughtException is false, the process will halt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to