[
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Serhii Nesterov updated SPARK-41060:
------------------------------------
Description:
*Description of the issue:*
There's a problem with submitting spark jobs to K8s cluster: the library
generates and reuses the same name for config maps (for drivers and executors).
Ideally, for each job 2 config maps should be created: for a driver and an
executor. However, the library creates only one driver config map for all jobs
(in some cases it generates only one executor map for all jobs in the same
manner). So, if I run 5 jobs, then only one driver config map will be generated
and used for every job. During those runs we experience issues when deleting
pods from the cluster: executors pods are endlessly created and immediately
terminated overloading cluster resources.
*The reason of the issue:*
This problem occurs because of the *KubernetesClientUtils* class in which we
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems
to be incorrect and should be urgently fixed. I've prepared some changes for
review to fix the issue (tested in the cluster of our project).
*Steps to reproduce the issue:*
# Create a *KubernetesClientApplication* object.
# Submit at least 2 jobs (sequentially or using *Thread* for running in
parallel).
*The results of my observations according to the steps are as follows:*
# Spark 3.1.2 - The same config map in K8S will be overwritten which means all
the jobs will point to the same config map.
# Spark 3.3.* - For the first job a new config map will be created. For other
jobs an exception will be thrown (the K8S Fabric library does not allow to
create a new config map with the existing name).
was:
There's a problem with submitting spark jobs to K8s cluster: the library
generates and reuses the same name for config maps (for drivers and executors).
Ideally, for each job 2 config maps should be created: for a driver and an
executor. However, the library creates only one driver config map for all jobs
(in some cases it generates only one executor map for all jobs in the same
manner). So, if I run 5 jobs, then only one driver config map will be generated
and used for every job. During those runs we experience issues when deleting
pods from the cluster: executors pods are endlessly created and immediately
terminated overloading cluster resources.
This problem occurs because of the *KubernetesClientUtils* class in which we
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems
to be incorrect and should be urgently fixed. I've prepared some changes for
review to fix the issue (tested in the cluster of our project).
Steps to reproduce the issue:
# Create a *KubernetesClientApplication* object.
# Submit at least 2 jobs (sequentially or using *Thread* for running in
parallel).
The results of my observations according to the steps are as follows:
# Spark 3.1.2 - The same config map in K8S will be overwritten which means all
the jobs will point to the same config map.
# Spark 3.3.* - For the first job a new config map will be created. For other
jobs an exception will be thrown (the K8S Fabric library does not allow to
create a new config map with the existing name).
> Spark Submitter generates a ConfigMap with the same name
> --------------------------------------------------------
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 3.1.2, 3.3.0, 3.3.1
> Reporter: Serhii Nesterov
> Priority: Major
>
> *Description of the issue:*
> There's a problem with submitting spark jobs to K8s cluster: the library
> generates and reuses the same name for config maps (for drivers and
> executors). Ideally, for each job 2 config maps should be created: for a
> driver and an executor. However, the library creates only one driver config
> map for all jobs (in some cases it generates only one executor map for all
> jobs in the same manner). So, if I run 5 jobs, then only one driver config
> map will be generated and used for every job. During those runs we
> experience issues when deleting pods from the cluster: executors pods are
> endlessly created and immediately terminated overloading cluster resources.
>
> *The reason of the issue:*
> This problem occurs because of the *KubernetesClientUtils* class in which we
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems
> to be incorrect and should be urgently fixed. I've prepared some changes for
> review to fix the issue (tested in the cluster of our project).
>
> *Steps to reproduce the issue:*
>
> # Create a *KubernetesClientApplication* object.
> # Submit at least 2 jobs (sequentially or using *Thread* for running in
> parallel).
>
> *The results of my observations according to the steps are as follows:*
> # Spark 3.1.2 - The same config map in K8S will be overwritten which means
> all the jobs will point to the same config map.
> # Spark 3.3.* - For the first job a new config map will be created. For
> other jobs an exception will be thrown (the K8S Fabric library does not allow
> to create a new config map with the existing name).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]