[
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Serhii Nesterov updated SPARK-41060:
------------------------------------
Description:
There's a problem with submitting spark job to K8s cluster: the library
generates and reuses the same name for config map (for drivers and executors).
So, if we run 5 jobs sequantially or in parallel, then one config map will be
created and then overwritten 4 times which means this config map will be
applied / used for all 5 jobs instead of creating one config map for each job.
During those runs we experience issues when deleting pods from the cluster:
executors pods are endlessly created and immediately terminated overloading
cluster resources.
This problem occurs because of the *KubernetesClientUtils* class in which we
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems
to be incorrect and should be fixed. I've prepared some changes for review to
fix the issue.
was:There's a problem with submitting spark job to K8s cluster: the library
generates and reuses the same name for Config Map (for drivers and executors).
So, if we run 5 jobs sequantially or in parallel, then one Config Map will be
created and then overwritten 4 times. During those runs we experience issues
when deleting pods from the cluster: executors pods are endlessly created and
immediately terminated overloading cluster resources.
> Spark Submitter generates a ConfigMap with the same name
> --------------------------------------------------------
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 3.3.0, 3.3.1
> Reporter: Serhii Nesterov
> Priority: Major
>
> There's a problem with submitting spark job to K8s cluster: the library
> generates and reuses the same name for config map (for drivers and
> executors). So, if we run 5 jobs sequantially or in parallel, then one config
> map will be created and then overwritten 4 times which means this config map
> will be applied / used for all 5 jobs instead of creating one config map for
> each job. During those runs we experience issues when deleting pods from the
> cluster: executors pods are endlessly created and immediately terminated
> overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems
> to be incorrect and should be fixed. I've prepared some changes for review to
> fix the issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]