Sumeet created SPARK-34949:
------------------------------

             Summary: Executor.reportHeartBeat reregisters blockManager even 
when Executor is shutting down
                 Key: SPARK-34949
                 URL: https://issues.apache.org/jira/browse/SPARK-34949
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.2.0
         Environment: Resource Manager: K8s
            Reporter: Sumeet


*Problem:*

I was testing Dynamic Allocation on K8s with about 300 executors. While doing 
so, when the executors were torn down due to 
"spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor pods 
being removed from K8s, however, under the "Executors" tab in SparkUI, I could 
see some executors listed as alive. 

[spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100]
 also returned a value greater than 1. 

 

*Cause:*
 * "CoarseGrainedSchedulerBackend" issues RemoveExecutor on a 
"executorEndpoint" and publishes "SparkListenerExecutorRemoved" on the 
"listenerBus"
 * "CoarseGrainedExecutorBackend" starts the executor shutdown
 * "HeartbeatReceiver" picks the "SparkListenerExecutorRemoved" event and 
removes the executor from "executorLastSeen"
 * In the meantime, the executor reports a Heartbeat. Now "HeartbeatReceiver" 
cannot find the "executorId" in "executorLastSeen" and hence responds with 
"HeartbeatResponse(reregisterBlockManager = true)"
 * The Executor now calls "env.blockManager.reregister()" and reregisters 
itself thus creating inconsistency

 

*Proposed Solution:*

The "reportHeartBeat" method is not aware of the fact that Executor is shutting 
down, it should check "executorShutdown" before reregistering. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to