[
https://issues.apache.org/jira/browse/SPARK-33669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Su Qilong updated SPARK-33669:
------------------------------
Description:
For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries
to interrupt Yarn application monitor thread. In MonitorThread.run() it catches
InterruptedException to gracefully response to stopping request.
But client.monitorApplication method also throws InterruptedIOException when
the hadoop rpc call is calling. In this case, MonitorThread will not know it is
interrupted, a Yarn App failed is returned with "Failed to contact YARN for
application xxxxx; YARN application has exited unexpectedly with state xxxxx"
is logged with error level. which confuse user a lot.
We Should take considerate InterruptedIOException here to make it the same
behavior with InterruptedException.
{code:java}
private class MonitorThread extends Thread {
private var allowInterrupt = true
override def run() {
try {
val YarnAppReport(_, state, diags) =
client.monitorApplication(appId.get, logApplicationReport = false)
logError(s"YARN application has exited unexpectedly with state $state! " +
"Check the YARN application logs for more details.")
diags.foreach { err =>
logError(s"Diagnostics message: $err")
}
allowInterrupt = false
sc.stop()
} catch {
case e: InterruptedException => logInfo("Interrupting monitor thread")
}
}
{code}
{code:java}
2020-12-05 03:06:58,000 ERROR [YARN application state monitor]:
org.apache.spark.deploy.yarn.Client(91) - Failed to contact YARN for
application application_1605868815011_1154961. -
sessionId[99e46a14-7995-41da-ba0a-c4c7387728a4] java.io.InterruptedIOException:
Call interrupted at org.apache.hadoop.ipc.Client.call(Client.java:1466) at
org.apache.hadoop.ipc.Client.call(Client.java:1409) at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy38.getApplicationReport(Unknown Source) at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy39.getApplicationReport(Unknown Source) at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:408)
at org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:327)
at org.apache.spark.deploy.yarn.Client.monitorApplication(Client.scala:1039) at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:116)
2020-12-05 03:06:58,000 ERROR [YARN application state monitor]:
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend(70) - YARN
application has exited unexpectedly with state FAILED! Check the YARN
application logs for more details. -
sessionId[99e46a14-7995-41da-ba0a-c4c7387728a4] 2020-12-05 03:06:58,000 INFO
[FrontendService-Handler-Pool: Thread-6560]:
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend(54) - Shutting
down all executors - sessionId[99e46a14-7995-41da-ba0a-c4c7387728a4] 2020-12-05
03:06:58,001 ERROR [YARN application state monitor]:
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend(70) - Diagnostics
message: Failed to contact YARN for application
application_1605868815011_1154961.
{code}
was:
For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries
to interrupt Yarn application monitor thread. In MonitorThread.run() it catches
InterruptedException to gracefully response to stopping request.
But client.monitorApplication method also throws InterruptedIOException when
the hadoop rpc call is calling. In this case, MonitorThread will not know it is
interrupted, a Yarn App failed is returned with "Failed to contact YARN for
application xxxxx; YARN application has exited unexpectedly with state xxxxx"
is logged with error level. which confuse user a lot.
We Should take considerate InterruptedIOException here to make it the same
behavior with InterruptedException.
{code:java}
private class MonitorThread extends Thread {
private var allowInterrupt = true
override def run() {
try {
val YarnAppReport(_, state, diags) =
client.monitorApplication(appId.get, logApplicationReport = false)
logError(s"YARN application has exited unexpectedly with state $state! " +
"Check the YARN application logs for more details.")
diags.foreach { err =>
logError(s"Diagnostics message: $err")
}
allowInterrupt = false
sc.stop()
} catch {
case e: InterruptedException => logInfo("Interrupting monitor thread")
}
}
{code}
{noformat}
// error message 2020-12-05 03:06:58,000 ERROR [YARN application state
monitor]: org.apache.spark.deploy.yarn.Client(91) - Failed to contact YARN for
application application_1605868815011_1154961. java.io.InterruptedIOException:
Call interrupted at org.apache.hadoop.ipc.Client.call(Client.java:1466) at
org.apache.hadoop.ipc.Client.call(Client.java:1409) at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy38.getApplicationReport(Unknown Source) at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy39.getApplicationReport(Unknown Source) at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:408)
at org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:327)
at org.apache.spark.deploy.yarn.Client.monitorApplication(Client.scala:1039) at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:116)
2020-12-05 03:06:58,000 ERROR [YARN application state monitor]:
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend(70) - YARN
application has exited unexpectedly with state FAILED! Check the YARN
application logs for more details. 2020-12-05 03:06:58,001 ERROR [YARN
application state monitor]:
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend(70) - Diagnostics
message: Failed to contact YARN for application
application_1605868815011_1154961.
{noformat}
> Wrong error message from YARN application state monitor when sc.stop in yarn
> client mode
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-33669
> URL: https://issues.apache.org/jira/browse/SPARK-33669
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 2.4.3, 3.0.1
> Reporter: Su Qilong
> Priority: Minor
>
> For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries
> to interrupt Yarn application monitor thread. In MonitorThread.run() it
> catches InterruptedException to gracefully response to stopping request.
> But client.monitorApplication method also throws InterruptedIOException when
> the hadoop rpc call is calling. In this case, MonitorThread will not know it
> is interrupted, a Yarn App failed is returned with "Failed to contact YARN
> for application xxxxx; YARN application has exited unexpectedly with state
> xxxxx" is logged with error level. which confuse user a lot.
> We Should take considerate InterruptedIOException here to make it the same
> behavior with InterruptedException.
> {code:java}
> private class MonitorThread extends Thread {
> private var allowInterrupt = true
> override def run() {
> try {
> val YarnAppReport(_, state, diags) =
> client.monitorApplication(appId.get, logApplicationReport = false)
> logError(s"YARN application has exited unexpectedly with state $state!
> " +
> "Check the YARN application logs for more details.")
> diags.foreach { err =>
> logError(s"Diagnostics message: $err")
> }
> allowInterrupt = false
> sc.stop()
> } catch {
> case e: InterruptedException => logInfo("Interrupting monitor thread")
> }
> }
>
> {code}
> {code:java}
> 2020-12-05 03:06:58,000 ERROR [YARN application state monitor]:
> org.apache.spark.deploy.yarn.Client(91) - Failed to contact YARN for
> application application_1605868815011_1154961. -
> sessionId[99e46a14-7995-41da-ba0a-c4c7387728a4]
> java.io.InterruptedIOException: Call interrupted at
> org.apache.hadoop.ipc.Client.call(Client.java:1466) at
> org.apache.hadoop.ipc.Client.call(Client.java:1409) at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy38.getApplicationReport(Unknown Source) at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
> at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy39.getApplicationReport(Unknown Source) at
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:408)
> at
> org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:327) at
> org.apache.spark.deploy.yarn.Client.monitorApplication(Client.scala:1039) at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:116)
> 2020-12-05 03:06:58,000 ERROR [YARN application state monitor]:
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend(70) - YARN
> application has exited unexpectedly with state FAILED! Check the YARN
> application logs for more details. -
> sessionId[99e46a14-7995-41da-ba0a-c4c7387728a4] 2020-12-05 03:06:58,000 INFO
> [FrontendService-Handler-Pool: Thread-6560]:
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend(54) - Shutting
> down all executors - sessionId[99e46a14-7995-41da-ba0a-c4c7387728a4]
> 2020-12-05 03:06:58,001 ERROR [YARN application state monitor]:
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend(70) -
> Diagnostics message: Failed to contact YARN for application
> application_1605868815011_1154961.
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]