Tin Hang To created SPARK-30310:
-----------------------------------

             Summary: SparkUncaughtExceptionHandler halts running process 
unexpectedly
                 Key: SPARK-30310
                 URL: https://issues.apache.org/jira/browse/SPARK-30310
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.0, 3.0.0
            Reporter: Tin Hang To


During 2.4.x testing, we have many occasions where the Worker process would 
just DEAD unexpectedly, with the Worker log ends with:

 

{{ERROR SparkUncaughtExceptionHandler: scala.MatchError:  <...callstack...>}}

 

We get the same callstack during our 2.3.x testing but the Worker process stays 
up.

Upon looking at the 2.4.x SparkUncaughtExceptionHandler.scala compared to the 
2.3.x version,  we found out SPARK-24294 introduced the following change:


{{ exception match {}}
{{  case _: OutOfMemoryError =>}}
{{    System.exit(SparkExitCode.OOM)}}
{{  case e: SparkFatalException if e.throwable.isInstanceOf[OutOfMemoryError] 
=>}}
{{    // SPARK-24294: This is defensive code, in case that SparkFatalException 
is}}
{{    // misused and uncaught.}}
{{    System.exit(SparkExitCode.OOM)}}
{{  case _ if exitOnUncaughtException =>}}
{{    System.exit(SparkExitCode.UNCAUGHT_EXCEPTION)}}
{{}}}

 

This code has the _ if exitOnUncaughtException case, but not the other _ cases. 
 As a result, when exitOnUncaughtException is false (Master and Worker) and 
exception doesn't match any of the match cases (e.g., IllegalStateException), 
Scala throws MatchError(exception) ("MatchError" wrapper of the original 
exception).  Then the other catch block down below thinks we have another 
uncaught exception, and halts the entire process with 
SparkExitCode.UNCAUGHT_EXCEPTION_TWICE.

 

{{catch {}}
{{  case oom: OutOfMemoryError => Runtime.getRuntime.halt(SparkExitCode.OOM)}}
{{  case t: Throwable => 
Runtime.getRuntime.halt(SparkExitCode.UNCAUGHT_EXCEPTION_TWICE)}}
{{}}}

 

Therefore, even when exitOnUncaughtException is false, the process will halt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to