[
https://issues.apache.org/jira/browse/SPARK-32067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Yu updated SPARK-32067:
-----------------------------
Summary: [K8S] Executor pod template config map of ongoing submission got
inadvertently altered by subsequent submission (was: [K8S] Executor pod
template of ongoing submission got inadvertently altered by subsequent
submission)
> [K8S] Executor pod template config map of ongoing submission got
> inadvertently altered by subsequent submission
> ---------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-32067
> URL: https://issues.apache.org/jira/browse/SPARK-32067
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 2.4.6, 3.0.0
> Reporter: James Yu
> Priority: Minor
>
> THE BUG:
> The bug is reproducible by spark-submit two different apps (app1 and app2)
> with different executor pod templates (e.g., different labels) to K8s
> sequentially, and with app2 launching while app1 is still ramping up all its
> executor pods. The unwanted result is that some launched executor pods of
> app1 end up having app2's executor pod template applied to them.
> The root cause appears to be that app1's podspec-configmap got overwritten by
> app2 during the overlapping launching periods because the configmap names of
> the two apps are the same. This causes some app1's executor pods being ramped
> up after app2 is launched to be inadvertently launched with the app2's pod
> template. The issue can be seen as follows:
> First, after submitting app1, you get these configmaps:
> {code:java}
> NAMESPACE NAME DATA AGE
> default app1-1111111111111111-driver-conf-map 1 9m46s
> default podspec-configmap 1 12m{code}
> Then submit app2 while app1 is still ramping up its executors. The
> podspec-confimap is modified by app2.
> {code:java}
> NAMESPACE NAME DATA AGE
> default app1-1111111111111111-driver-conf-map 1 11m43s
> default app2-2222222222222222-driver-conf-map 1 10s
> default podspec-configmap 1 13m57s{code}
>
> PROPOSED SOLUTION:
> Properly prefix the podspec-configmap for each submitted app.
> {code:java}
> NAMESPACE NAME DATA AGE
> default app1-1111111111111111-driver-conf-map 1 11m43s
> default app1-1111111111111111-podspec-configmap 1 13m57s
> default app2-2222222222222222-driver-conf-map 1 10s
> default app2-2222222222222222-podspec-configmap 1 3m{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]