[
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated SPARK-40320:
-------------------------
Description:
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
/**
*/
override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin()
/**
*/
override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
private val checkingInterval: Long = 1
override def init(_ctx: PluginContext, extraConf: util.Map[String, String]):
Unit = {
if (checkingInterval == 1) {
throw new UnsatisfiedLinkError("LCL my Exception error2")
}
}
} {code}
The Executor is active when we check in spark-ui, however it was broken and
doesn't receive any task.
Root Cause:
I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method
`dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process
is active but the communication thread is no longer working
was:
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
/**
*/
override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin()
/**
*/
override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
private val checkingInterval: Long = 1
override def init(_ctx: PluginContext, extraConf: util.Map[String, String]):
Unit = {
if (checkingInterval == 1) {
throw new UnsatisfiedLinkError("LCL my Exception error2")
}
}
} {code}
The Executor is active when we check in spark-ui, however it was broken and
doesn't receive any task.
Root Cause:
I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method
`dealWithFatalError` . Actually the executor
> When the Executor plugin fails to initialize, the Executor shows active but
> does not accept tasks forever, just like being hung
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
> Issue Type: Bug
> Components: Scheduler
> Affects Versions: 3.0.0
> Reporter: Mars
> Priority: Major
>
> Reproduce step:
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
> /**
> */
> override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin()
> /**
> */
> override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
> private val checkingInterval: Long = 1
> override def init(_ctx: PluginContext, extraConf: util.Map[String,
> String]): Unit = {
> if (checkingInterval == 1) {
> throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
> }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and
> doesn't receive any task.
> Root Cause:
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall`
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in
> method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend`
> JVM process is active but the communication thread is no longer working
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]