[
https://issues.apache.org/jira/browse/LIVY-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972668#comment-16972668
]
Michal Sankot commented on LIVY-712:
------------------------------------
As I search though the code i see a change in SparkYarnApp.scala line 295 on
between 0.4.0 and 0.5.0 (same in 0.6.0).
0.4.0:
{{ } catch {}}
{{ case e: InterruptedException =>}}
{{ yarnDiagnostics = ArrayBuffer("Session stopped by user.")}}
{{ changeState(SparkApp.State.KILLED)}}
{{ case e: Throwable =>}}
{{ error(s"Error whiling refreshing YARN state: $e")}}
{{ yarnDiagnostics = ArrayBuffer(e.toString, e.getStackTrace().mkString("
"))}}
{{ changeState(SparkApp.State.FAILED)}}
}
0.5.0/0.6.0:
{{ } catch {}}
{{ case _: InterruptedException =>}}
{{ yarnDiagnostics = ArrayBuffer("Session stopped by user.")}}
{{ changeState(SparkApp.State.KILLED)}}
{{ case NonFatal(e) =>}}
{{ error(s"Error whiling refreshing YARN state", e)}}
{{ yarnDiagnostics = ArrayBuffer(e.getMessage)}}
{{ changeState(SparkApp.State.FAILED)}}
}
So it seems that in 0.5.0+ all Fatal exceptions are ignored and don't make App
go into FAILED state. That looks like a bug. Is it so?
> EMR 5.23/5.27 - Livy does not recognise that Spark job failed
> -------------------------------------------------------------
>
> Key: LIVY-712
> URL: https://issues.apache.org/jira/browse/LIVY-712
> Project: Livy
> Issue Type: Bug
> Components: API
> Affects Versions: 0.5.0, 0.6.0
> Environment: AWS EMR 5.23/5.27, Scala
> Reporter: Michal Sankot
> Priority: Major
> Labels: EMR, api, spark
>
> We've upgraded from AWS EMR 5.13 -> 5.23 (Livy 0.4.0 -> 0.5.0, Spark 2.3.0 ->
> 2.4.0) and an issue appears that when there is an exception thrown during
> Spark job execution, Spark shuts down as if there was no problem and job
> appears as Completed in EMR. So we're not notified when system crashes. The
> same problem appears in EMR 5.27 (Livy 0.6.0, Spark 2.4.4).
> Is it something with Spark? Or a known issue with Livy?
> In Livy logs I see that spark-submit exists with error code 1
> {quote}{{05:34:59 WARN BatchSession$: spark-submit exited with code 1}}
> {quote}
> And then Livy API states that batch state is
> {quote}{{"state": "success"}}
> {quote}
> How can it be made work again?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)