[
https://issues.apache.org/jira/browse/SPARK-32067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Yu updated SPARK-32067:
-----------------------------
Description:
THE BUG:
The bug is reproducible by spark-submit two different apps (app1 and app2) with
different executor pod templates (e.g., different labels) to K8s sequentially,
with app2 launching while app1 is still in the middle of ramping up all its
executor pods. The unwanted result is that some launched executor pods of app1
end up having app2's executor pod template applied to them.
The root cause appears to be that app1's podspec-configmap got overwritten by
app2 during the overlapping launching periods because the configmap names of
the two apps are the same. This causes some app1's executor pods being ramped
up after app2 is launched to be inadvertently launched with the app2's pod
template. The issue can be seen as follows:
First, after submitting app1, you get these configmaps:
{code:java}
NAMESPACE NAME DATA AGE
default app1-1111111111111111-driver-conf-map 1 9m46s
default podspec-configmap 1 12m{code}
Then submit app2 while app1 is still ramping up its executors. The
podspec-confimap is modified by app2.
{code:java}
NAMESPACE NAME DATA AGE
default app1-1111111111111111-driver-conf-map 1 11m43s
default app2-2222222222222222-driver-conf-map 1 10s
default podspec-configmap 1 13m57s{code}
PROPOSED SOLUTION:
Properly prefix the podspec-configmap for each submitted app.
{code:java}
NAMESPACE NAME DATA AGE
default app1-1111111111111111-driver-conf-map 1 11m43s
default app1-1111111111111111-podspec-configmap 1 13m57s
default app2-2222222222222222-driver-conf-map 1 10s
default app2-2222222222222222-podspec-configmap 1 3m{code}
was:
THE BUG:
The bug is reproducible by spark-submit two different apps (app1 and app2) with
different executor pod templates (e.g., different labels) to K8s sequentially,
and with app2 launching while app1 is still ramping up all its executor pods.
The unwanted result is that some launched executor pods of app1 end up having
app2's executor pod template applied to them.
The root cause appears to be that app1's podspec-configmap got overwritten by
app2 during the overlapping launching periods because the configmap names of
the two apps are the same. This causes some app1's executor pods being ramped
up after app2 is launched to be inadvertently launched with the app2's pod
template. The issue can be seen as follows:
First, after submitting app1, you get these configmaps:
{code:java}
NAMESPACE NAME DATA AGE
default app1-1111111111111111-driver-conf-map 1 9m46s
default podspec-configmap 1 12m{code}
Then submit app2 while app1 is still ramping up its executors. The
podspec-confimap is modified by app2.
{code:java}
NAMESPACE NAME DATA AGE
default app1-1111111111111111-driver-conf-map 1 11m43s
default app2-2222222222222222-driver-conf-map 1 10s
default podspec-configmap 1 13m57s{code}
PROPOSED SOLUTION:
Properly prefix the podspec-configmap for each submitted app.
{code:java}
NAMESPACE NAME DATA AGE
default app1-1111111111111111-driver-conf-map 1 11m43s
default app1-1111111111111111-podspec-configmap 1 13m57s
default app2-2222222222222222-driver-conf-map 1 10s
default app2-2222222222222222-podspec-configmap 1 3m{code}
> [K8S] Executor pod template config map of ongoing submission got
> inadvertently altered by subsequent submission
> ---------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-32067
> URL: https://issues.apache.org/jira/browse/SPARK-32067
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 2.4.6, 3.0.0
> Reporter: James Yu
> Priority: Minor
>
> THE BUG:
> The bug is reproducible by spark-submit two different apps (app1 and app2)
> with different executor pod templates (e.g., different labels) to K8s
> sequentially, with app2 launching while app1 is still in the middle of
> ramping up all its executor pods. The unwanted result is that some launched
> executor pods of app1 end up having app2's executor pod template applied to
> them.
> The root cause appears to be that app1's podspec-configmap got overwritten by
> app2 during the overlapping launching periods because the configmap names of
> the two apps are the same. This causes some app1's executor pods being ramped
> up after app2 is launched to be inadvertently launched with the app2's pod
> template. The issue can be seen as follows:
> First, after submitting app1, you get these configmaps:
> {code:java}
> NAMESPACE NAME DATA AGE
> default app1-1111111111111111-driver-conf-map 1 9m46s
> default podspec-configmap 1 12m{code}
> Then submit app2 while app1 is still ramping up its executors. The
> podspec-confimap is modified by app2.
> {code:java}
> NAMESPACE NAME DATA AGE
> default app1-1111111111111111-driver-conf-map 1 11m43s
> default app2-2222222222222222-driver-conf-map 1 10s
> default podspec-configmap 1 13m57s{code}
>
> PROPOSED SOLUTION:
> Properly prefix the podspec-configmap for each submitted app.
> {code:java}
> NAMESPACE NAME DATA AGE
> default app1-1111111111111111-driver-conf-map 1 11m43s
> default app1-1111111111111111-podspec-configmap 1 13m57s
> default app2-2222222222222222-driver-conf-map 1 10s
> default app2-2222222222222222-podspec-configmap 1 3m{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]