This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
     new 1d8b556  [SPARK-38023][CORE] `ExecutorMonitor.onExecutorRemoved` 
should handle `ExecutorDecommission` as finished
1d8b556 is described below

commit 1d8b556bd600baaed23fe700b60bddb2bf7cfc5f
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Tue Jan 25 14:34:56 2022 -0800

    [SPARK-38023][CORE] `ExecutorMonitor.onExecutorRemoved` should handle 
`ExecutorDecommission` as finished
    
    Although SPARK-36614 (https://github.com/apache/spark/pull/33868) fixed the 
UI issue, it made a regression where the `K8s integration test` has been broken 
and shows a wrong metrics and message to the users. After `Finished 
decommissioning`, it's still counted it as `unfinished`. This PR aims to fix 
this bug.
    
    **BEFORE**
    ```
    22/01/25 13:05:16 DEBUG 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:
    Asked to remove executor 1 with reason Finished decommissioning
    ...
    22/01/25 13:05:16 INFO ExecutorMonitor: Executor 1 is removed.
    Remove reason statistics: (gracefully decommissioned: 0, decommision 
unfinished: 1, driver killed: 0, unexpectedly exited: 0).
    ```
    
    **AFTER**
    ```
    Remove reason statistics: (gracefully decommissioned: 1, decommision 
unfinished: 0, driver killed: 0, unexpectedly exited: 0).
    ```
    
    ```
    $ build/sbt -Pkubernetes -Pkubernetes-integration-tests 
-Dspark.kubernetes.test.dockerFile=resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile.java17
 -Dtest.exclude.tags=minikube,r "kubernetes-integration-tests/test"
    ```
    
    **BEFORE**
    The corresponding test case hangs and fails.
    ```
    [info] KubernetesSuite:
    ...
    [info] *** Test still running after 2 minutes, 13 seconds: suite name: 
KubernetesSuite, test name: Test decommissioning with dynamic allocation & 
shuffle cleanups.
    // Eventually fails
    ...
    ```
    
    **AFTER**
    ```
    [info] KubernetesSuite:
    ...
    [info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 
minutes, 41 seconds)
    ...
    ```
    
    Yes, this is a regression bug fix.
    
    Manually because this should be verified via the K8s integration test
    ```
    $ build/sbt -Pkubernetes -Pkubernetes-integration-tests 
-Dspark.kubernetes.test.dockerFile=resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile.java17
 -Dtest.exclude.tags=minikube,r "kubernetes-integration-tests/test"
    ```
    
    Closes #35321 from dongjoon-hyun/SPARK-38023.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 9887d0f7f55157da1b9f55d7053cc6c78ea3cdc5)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala    | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala 
b/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
index acdfaa5..8ce24aa 100644
--- 
a/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
+++ 
b/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
@@ -355,7 +355,8 @@ private[spark] class ExecutorMonitor(
     if (removed != null) {
       decrementExecResourceProfileCount(removed.resourceProfileId)
       if (removed.decommissioning) {
-        if (event.reason == ExecutorLossMessage.decommissionFinished) {
+        if (event.reason == ExecutorLossMessage.decommissionFinished ||
+            event.reason == ExecutorDecommission().message) {
           metrics.gracefullyDecommissioned.inc()
         } else {
           metrics.decommissionUnfinished.inc()

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to