[ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:
------------------------------------
    Description: 
There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should be created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs in the same 
manner). So, if I run 5 jobs, then only one driver config map will be generated 
and used for every job.  During those runs we experience issues when deleting 
pods from the cluster: executors pods are endlessly created and immediately 
terminated overloading cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).

  was:
There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs). So, if I run 5 
jobs, then only one driver config map will be generated and used for every job. 
 During those runs we experience issues when deleting pods from the cluster: 
executors pods are endlessly created and immediately terminated overloading 
cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).


> Spark Submitter generates a ConfigMap with the same name
> --------------------------------------------------------
>
>                 Key: SPARK-41060
>                 URL: https://issues.apache.org/jira/browse/SPARK-41060
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.3.0, 3.3.1
>            Reporter: Serhii Nesterov
>            Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should be created: for a 
> driver and an executor. However, the library creates only one driver config 
> map for all jobs (in some cases it generates only one executor map for all 
> jobs in the same manner). So, if I run 5 jobs, then only one driver config 
> map will be generated and used for every job.  During those runs we 
> experience issues when deleting pods from the cluster: executors pods are 
> endlessly created and immediately terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to