This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 5475088 [SPARK-35011][CORE] Fix false active executor in UI that
caused by BlockManager reregistration
5475088 is described below
commit 54750887d8ba46abb379b576c539d25a17429f24
Author: wuyi <[email protected]>
AuthorDate: Thu Nov 11 16:18:38 2021 -0800
[SPARK-35011][CORE] Fix false active executor in UI that caused by
BlockManager reregistration
### What changes were proposed in this pull request?
Also post the event `SparkListenerExecutorRemoved` when removing an
executor, which is known by `BlockManagerMaster` but unknown to
`SchedulerBackend`.
### Why are the changes needed?
In https://github.com/apache/spark/pull/32114, it reports an issue that
`BlockManagerMaster` could register a `BlockManager` from a dead executor due
to reregistration mechanism. The side effect is, the executor will be shown on
the UI as an active one, though it's already dead indeed.
In https://github.com/apache/spark/pull/32114, we tried to avoid such
reregistration for a to-be-dead executor. However, I just realized that we can
actually leave such reregistration alone since
`HeartbeatReceiver.expireDeadHosts` should clean up those `BlockManager`s in
the end. The problem is, the corresponding executors in UI can't be cleaned
along with the `BlockManager`s cleaning. Because executors in UI can only be
cleaned by `SparkListenerExecutorRemoved`,
while `BlockManager`s cleaning only post
`SparkListenerBlockManagerRemoved` (which is ignored by `AppStatusListener`).
### Does this PR introduce _any_ user-facing change?
Yes, users would see the false active executor be removed in the end.
### How was this patch tested?
Pass existing tests.
Closes #34536 from Ngone51/SPARK-35011.
Lead-authored-by: wuyi <[email protected]>
Co-authored-by: yi.wu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git
a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
index b40eee3..326ea83 100644
---
a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
+++
b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
@@ -438,6 +438,14 @@ class CoarseGrainedSchedulerBackend(scheduler:
TaskSchedulerImpl, val rpcEnv: Rp
// about the executor, but the scheduler will not. Therefore, we
should remove the
// executor from the block manager when we hit this case.
scheduler.sc.env.blockManager.master.removeExecutorAsync(executorId)
+ // SPARK-35011: If we reach this code path, which means the executor
has been
+ // already removed from the scheduler backend but the block manager
master may
+ // still know it. In this case, removing the executor from block
manager master
+ // would only post the event `SparkListenerBlockManagerRemoved`,
which is unfortunately
+ // ignored by `AppStatusListener`. As a result, the executor would
be shown on the UI
+ // forever. Therefore, we should also post
`SparkListenerExecutorRemoved` here.
+ listenerBus.post(SparkListenerExecutorRemoved(
+ System.currentTimeMillis(), executorId, reason.toString))
logInfo(s"Asked to remove non-existent executor $executorId")
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]