[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

Serhii Nesterov (Jira) Thu, 10 Nov 2022 16:01:07 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Serhii Nesterov updated SPARK-41060:
------------------------------------
    Description: 
*Description of the issue:*

There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should be created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs in the same 
manner). So, if I run 5 jobs, then only one driver config map will be generated 
and used for every job.  During those runs we experience issues when deleting 
pods from the cluster: executors pods are endlessly created and immediately 
terminated overloading cluster resources.

 

*The reason of the issue:*

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).

 

*Steps to reproduce the issue:*

 
 # Create a *KubernetesClientApplication* object.
 # Submit at least 2 jobs (sequentially or using *Thread* for running in 
parallel).

 

*The results of my observations according to the steps are as follows:*
 # Spark 3.1.2 - The same config map in K8S will be overwritten which means all 
the jobs will point to the same config map.
 # Spark 3.3.* -  For the first job a new config map will be created. For other 
jobs an exception will be thrown (the K8S Fabric library does not allow to 
create a new config map with the existing name).

  was:
There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should be created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs in the same 
manner). So, if I run 5 jobs, then only one driver config map will be generated 
and used for every job.  During those runs we experience issues when deleting 
pods from the cluster: executors pods are endlessly created and immediately 
terminated overloading cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).

 

Steps to reproduce the issue:

 
 # Create a *KubernetesClientApplication* object.
 # Submit at least 2 jobs (sequentially or using *Thread* for running in 
parallel).

 

The results of my observations according to the steps are as follows:
 # Spark 3.1.2 - The same config map in K8S will be overwritten which means all 
the jobs will point to the same config map.
 # Spark 3.3.* -  For the first job a new config map will be created. For other 
jobs an exception will be thrown (the K8S Fabric library does not allow to 
create a new config map with the existing name).


> Spark Submitter generates a ConfigMap with the same name
> --------------------------------------------------------
>
>                 Key: SPARK-41060
>                 URL: https://issues.apache.org/jira/browse/SPARK-41060
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.1.2, 3.3.0, 3.3.1
>            Reporter: Serhii Nesterov
>            Priority: Major
>
> *Description of the issue:*
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should be created: for a 
> driver and an executor. However, the library creates only one driver config 
> map for all jobs (in some cases it generates only one executor map for all 
> jobs in the same manner). So, if I run 5 jobs, then only one driver config 
> map will be generated and used for every job.  During those runs we 
> experience issues when deleting pods from the cluster: executors pods are 
> endlessly created and immediately terminated overloading cluster resources.
>  
> *The reason of the issue:*
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).
>  
> *Steps to reproduce the issue:*
>  
>  # Create a *KubernetesClientApplication* object.
>  # Submit at least 2 jobs (sequentially or using *Thread* for running in 
> parallel).
>  
> *The results of my observations according to the steps are as follows:*
>  # Spark 3.1.2 - The same config map in K8S will be overwritten which means 
> all the jobs will point to the same config map.
>  # Spark 3.3.* -  For the first job a new config map will be created. For 
> other jobs an exception will be thrown (the K8S Fabric library does not allow 
> to create a new config map with the existing name).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

Reply via email to