This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 5475088  [SPARK-35011][CORE] Fix false active executor in UI that 
caused by BlockManager reregistration
5475088 is described below

commit 54750887d8ba46abb379b576c539d25a17429f24
Author: wuyi <[email protected]>
AuthorDate: Thu Nov 11 16:18:38 2021 -0800

    [SPARK-35011][CORE] Fix false active executor in UI that caused by 
BlockManager reregistration
    
    ### What changes were proposed in this pull request?
    
    Also post the event `SparkListenerExecutorRemoved` when removing an 
executor, which is known by `BlockManagerMaster` but unknown to 
`SchedulerBackend`.
    
    ### Why are the changes needed?
    
    In https://github.com/apache/spark/pull/32114, it reports an issue that 
`BlockManagerMaster` could register a `BlockManager` from a dead executor due 
to reregistration mechanism. The side effect is, the executor will be shown on 
the UI as an active one, though it's already dead indeed.
    
    In https://github.com/apache/spark/pull/32114, we tried to avoid such 
reregistration for a to-be-dead executor. However, I just realized that we can 
actually leave such reregistration alone since 
`HeartbeatReceiver.expireDeadHosts` should clean up those `BlockManager`s in 
the end. The problem is, the corresponding executors in UI can't be cleaned 
along with the `BlockManager`s cleaning. Because executors in UI can only be 
cleaned by `SparkListenerExecutorRemoved`,
     while `BlockManager`s  cleaning only post 
`SparkListenerBlockManagerRemoved` (which is ignored by `AppStatusListener`).
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, users would see the false active executor be removed in the end.
    
    ### How was this patch tested?
    
    Pass existing tests.
    
    Closes #34536 from Ngone51/SPARK-35011.
    
    Lead-authored-by: wuyi <[email protected]>
    Co-authored-by: yi.wu <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala   | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
index b40eee3..326ea83 100644
--- 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
@@ -438,6 +438,14 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
           // about the executor, but the scheduler will not. Therefore, we 
should remove the
           // executor from the block manager when we hit this case.
           scheduler.sc.env.blockManager.master.removeExecutorAsync(executorId)
+          // SPARK-35011: If we reach this code path, which means the executor 
has been
+          // already removed from the scheduler backend but the block manager 
master may
+          // still know it. In this case, removing the executor from block 
manager master
+          // would only post the event `SparkListenerBlockManagerRemoved`, 
which is unfortunately
+          // ignored by `AppStatusListener`. As a result, the executor would 
be shown on the UI
+          // forever. Therefore, we should also post 
`SparkListenerExecutorRemoved` here.
+          listenerBus.post(SparkListenerExecutorRemoved(
+            System.currentTimeMillis(), executorId, reason.toString))
           logInfo(s"Asked to remove non-existent executor $executorId")
       }
     }

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to