[ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:
------------------------------------
    Description: 
There's a problem with submitting spark job to K8s cluster: the library 
generates and reuses the same name for config map (for drivers and executors). 
So, if we run 5 jobs sequantially or in parallel, then one config map will be 
created and then overwritten 4 times which means this config map will be 
applied / used for all 5 jobs instead of creating one config map for each job. 
During those runs we experience issues when deleting pods from the cluster: 
executors pods are endlessly created and immediately terminated overloading 
cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be fixed. I've prepared some changes for review to 
fix the issue.

  was:There's a problem with submitting spark job to K8s cluster: the library 
generates and reuses the same name for Config Map (for drivers and executors). 
So, if we run 5 jobs sequantially or in parallel, then one Config Map will be 
created and then overwritten 4 times. During those runs we experience issues 
when deleting pods from the cluster: executors pods are endlessly created and 
immediately terminated overloading cluster resources.


> Spark Submitter generates a ConfigMap with the same name
> --------------------------------------------------------
>
>                 Key: SPARK-41060
>                 URL: https://issues.apache.org/jira/browse/SPARK-41060
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.3.0, 3.3.1
>            Reporter: Serhii Nesterov
>            Priority: Major
>
> There's a problem with submitting spark job to K8s cluster: the library 
> generates and reuses the same name for config map (for drivers and 
> executors). So, if we run 5 jobs sequantially or in parallel, then one config 
> map will be created and then overwritten 4 times which means this config map 
> will be applied / used for all 5 jobs instead of creating one config map for 
> each job. During those runs we experience issues when deleting pods from the 
> cluster: executors pods are endlessly created and immediately terminated 
> overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be fixed. I've prepared some changes for review to 
> fix the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to