[ https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mars updated SPARK-40320: ------------------------- Description: Reproduce step: set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. Root Cause: I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working was: Reproduce step: set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. Root Cause: I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the executor > When the Executor plugin fails to initialize, the Executor shows active but > does not accept tasks forever, just like being hung > ------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-40320 > URL: https://issues.apache.org/jira/browse/SPARK-40320 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 3.0.0 > Reporter: Mars > Priority: Major > > Reproduce step: > set `spark.plugins=ErrorSparkPlugin` > `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to > make it clearer): > {code:java} > class ErrorSparkPlugin extends SparkPlugin { > /** > */ > override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() > /** > */ > override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() > }{code} > {code:java} > class ErrorExecutorPlugin extends ExecutorPlugin with Logging { > private val checkingInterval: Long = 1 > override def init(_ctx: PluginContext, extraConf: util.Map[String, > String]): Unit = { > if (checkingInterval == 1) { > throw new UnsatisfiedLinkError("LCL my Exception error2") > } > } > } {code} > The Executor is active when we check in spark-ui, however it was broken and > doesn't receive any task. > Root Cause: > I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` > it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in > method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` > JVM process is active but the communication thread is no longer working > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org