Dmitro Valentiev created SPARK-34453:
----------------------------------------

             Summary: ExecutorPodsLifecycleManager fails to remove executors in 
Kubernetes, SPARK 3.0.1
                 Key: SPARK-34453
                 URL: https://issues.apache.org/jira/browse/SPARK-34453
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 3.0.1
         Environment: SPARK 3.0.1
 EKS 1.15

Spark cluster runs in Kubernetes cluster though spark submit.
            Reporter: Dmitro Valentiev


Happens when driver fails to register the reason behind deletion, e.g:
{code:java}
2021-02-17 12:07:56,953 DEBUG 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:61 - Asked to remove 
executor 1 with reason The executor with id 1 was deleted by a user or the 
framework.
{code}
 ExecutorPodsLifecycleManager fails to remove missing executor and gets stuck 
in this loop:  
{code:java}
2021-02-17 12:13:39,023 DEBUG ExecutorPodsLifecycleManager:61 - Removed 
executors with ids 3 from Spark that were either found to be deleted or 
non-existent in the cluster.
2021-02-17 12:15:09,042 DEBUG ExecutorPodsLifecycleManager:61 - The executor 
with ID 3 was not found in the cluster but we didn't get a reason why. Marking 
the executor as failed. The executor may have been deleted but the driver 
missed the deletion event.
{code}
 

Steps to reproduce: 
 # Deploy spark cluster into  Kubernetes
 # Delete an executor pod though kubectl

 

Could be linked / duplicate of  SPARK-28488



 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to