[
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Serhii Nesterov updated SPARK-41060:
------------------------------------
Description:
There's a problem with submitting spark jobs to K8s cluster: the library
generates and reuses the same name for config maps (for drivers and executors).
Ideally, for each job 2 config maps should created: for a driver and an
executor. However, the library creates only one driver config map for all jobs
(in some cases it generates only one executor map for all jobs). So, if I run 5
jobs, then only one driver config map will be generated and used for every job.
During those runs we experience issues when deleting pods from the cluster:
executors pods are endlessly created and immediately terminated overloading
cluster resources.
This problem occurs because of the *KubernetesClientUtils* class in which we
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems
to be incorrect and should be urgently fixed. I've prepared some changes for
review to fix the issue (tested in the cluster of our project).
was:
There's a problem with submitting spark job to K8s cluster: the library
generates and reuses the same name for config map (for drivers and executors).
So, if we run 5 jobs sequantially or in parallel, then one config map will be
created and then overwritten 4 times which means this config map will be
applied / used for all 5 jobs instead of creating one config map for each job.
During those runs we experience issues when deleting pods from the cluster:
executors pods are endlessly created and immediately terminated overloading
cluster resources.
This problem occurs because of the *KubernetesClientUtils* class in which we
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems
to be incorrect and should be fixed. I've prepared some changes for review to
fix the issue.
> Spark Submitter generates a ConfigMap with the same name
> --------------------------------------------------------
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 3.3.0, 3.3.1
> Reporter: Serhii Nesterov
> Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library
> generates and reuses the same name for config maps (for drivers and
> executors). Ideally, for each job 2 config maps should created: for a driver
> and an executor. However, the library creates only one driver config map for
> all jobs (in some cases it generates only one executor map for all jobs). So,
> if I run 5 jobs, then only one driver config map will be generated and used
> for every job. During those runs we experience issues when deleting pods
> from the cluster: executors pods are endlessly created and immediately
> terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems
> to be incorrect and should be urgently fixed. I've prepared some changes for
> review to fix the issue (tested in the cluster of our project).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]