[ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-------------------------
    Description: 
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
    if (checkingInterval == 1) {
      throw new UnsatisfiedLinkError("LCL my Exception error2")
    }
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working

 

  was:
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
    if (checkingInterval == 1) {
      throw new UnsatisfiedLinkError("LCL my Exception error2")
    }
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the executor 

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-40320
>                 URL: https://issues.apache.org/jira/browse/SPARK-40320
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 3.0.0
>            Reporter: Mars
>            Priority: Major
>
> Reproduce step:
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>    */
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>    */
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
>     if (checkingInterval == 1) {
>       throw new UnsatisfiedLinkError("LCL my Exception error2")
>     }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> Root Cause:
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` 
> JVM process  is active but the  communication thread is no longer working
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to