[ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:
------------------------------------
    Description: 
There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs). So, if I run 5 
jobs, then only one driver config map will be generated and used for every job. 
 During those runs we experience issues when deleting pods from the cluster: 
executors pods are endlessly created and immediately terminated overloading 
cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).

  was:
There's a problem with submitting spark job to K8s cluster: the library 
generates and reuses the same name for config map (for drivers and executors). 
So, if we run 5 jobs sequantially or in parallel, then one config map will be 
created and then overwritten 4 times which means this config map will be 
applied / used for all 5 jobs instead of creating one config map for each job. 
During those runs we experience issues when deleting pods from the cluster: 
executors pods are endlessly created and immediately terminated overloading 
cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be fixed. I've prepared some changes for review to 
fix the issue.


> Spark Submitter generates a ConfigMap with the same name
> --------------------------------------------------------
>
>                 Key: SPARK-41060
>                 URL: https://issues.apache.org/jira/browse/SPARK-41060
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.3.0, 3.3.1
>            Reporter: Serhii Nesterov
>            Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should created: for a driver 
> and an executor. However, the library creates only one driver config map for 
> all jobs (in some cases it generates only one executor map for all jobs). So, 
> if I run 5 jobs, then only one driver config map will be generated and used 
> for every job.  During those runs we experience issues when deleting pods 
> from the cluster: executors pods are endlessly created and immediately 
> terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to