Xiaobao Wu created SPARK-52616:
----------------------------------
Summary: The volcano podgroup created by spark cluster mode in
some scenarios is useless
Key: SPARK-52616
URL: https://issues.apache.org/jira/browse/SPARK-52616
Project: Spark
Issue Type: Bug
Components: Kubernetes
Affects Versions: 3.3.2
Environment: {*}K8S version{*}:
{code:bash}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.8",
GitCommit:"0ce7342c984110dfc93657d64df5dc3b2c0d1fe9", GitTreeState:"clean",
BuildDate:"2023-03-15T13:39:54Z", GoVersion:"go1.19.7", Compiler:"gc",
Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.8",
GitCommit:"0ce7342c984110dfc93657d64df5dc3b2c0d1fe9", GitTreeState:"clean",
BuildDate:"2023-03-15T13:33:02Z", GoVersion:"go1.19.7", Compiler:"gc",
Platform:"linux/amd64"}
{code}
{*}OS{*}:
{code:bash}
$ cat /etc/os-release
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Tercel)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)"
ANSI_COLOR="0;31"
{code}
Reporter: Xiaobao Wu
h3. Description
h4. Issue Phenomenon
In some scenarios, even after the podgroup is specified by the
podGroupTemplateFile, the volcano-controllers will still create a default
podgroup, which will cause the podgroup specified by the user in the
podGroupTemplateFile to fail to work.
h4. Preliminary Conclusion
When spark create a podgroup at a slower time than volcano-controllers, the
podgroup created by spark cannot play its original meaning ( if user configures
a minResource ).
h4. Preliminary analysis
1. spark job status:
{code:bash}
$ kubectl -n spark-test get po
NAME READY STATUS RESTARTS AGE
spark-pi-5b5529971602bcd7-exec-1 0/1 Pending 0 46m
spark-pi-5d020597160284cc-driver 1/1 Running 0 47m
spark-pi-6b8f9c971602bf1f-exec-1 0/1 Pending 0 46m
spark-pi-9b535397160284a9-driver 1/1 Running 0 47m
$ kubectl -n spark-test get pg
NAME STATUS MINMEMBER
RUNNINGS AGE
podgroup-317e69ec-9094-47cf-bcf0-cc0674f27614 Running 1 1
48m
podgroup-a0dc2097-c30a-4c9d-9d0e-b400e32979a0 Running 1 1
48m
spark-33b811a7bb1c4c92a443afb719dd00bb-podgroup Pending 1
48m
spark-3bc4595a02fe4d088d47516e0316c359-podgroup Pending 1
48m
$ kubectl -n spark-test get po -o yaml | grep group-name
scheduling.k8s.io/group-name:
podgroup-317e69ec-9094-47cf-bcf0-cc0674f27614
scheduling.k8s.io/group-name:
podgroup-a0dc2097-c30a-4c9d-9d0e-b400e32979a0
{code}
2. volcano-controllers log:
{code:bashl}
I0528 16:29:13.550489 1 pg_controller.go:174] Try to create podgroup for
pod spark-test/spark-pi-5d020597160284cc-driver
I0528 16:29:13.555823 1 pg_controller.go:174] Try to create podgroup for
pod spark-test/spark-pi-9b535397160284a9-driver
{code}
3. code views:
spark add podGroup annotation (scheduling.k8s.io/group-name) in :
[spark/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala|https://github.com/apache/spark/blob/5103e00c4ce5fcc4264ca9c4df12295d42557af6/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala#L62]
volcano-controller to determine whether podgroup exists in:
[volcano/pkg/controllers/podgroup/pg_controller.go|https://github.com/volcano-sh/volcano/blob/f1141171980bee89114c9be3f0e3c1472848e0b0/pkg/controllers/podgroup/pg_controller.go#L168-L171]
h3. Steps to reproduce the issue
1. Create queue for spark jobs:
{code:yaml}
# the queue.yaml use for spark job
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: spark-test
spec:
capability:
cpu: "2"
memory: "2Gi"
{code}
2. Prepare podGroup template:
{code:yaml}
# podgroup-template.yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
spec:
minMember: 1
minResources:
cpu: "2"
memory: "2048Mi"
queue: spark-test
{code}
3. Submit two Spark jobs with the same content in parallel:
{code:bash}
/data/spark/spark-3.3.2-bin-hadoop3/bin/spark-submit --master
k8s://https://127.0.0.1:6443 --deploy-mode cluster --name spark-pi \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.namespace=spark-test \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=1 \
--conf spark.executor.cores=1 \
--conf spark.executor.memory=1G \
--conf spark.driver.cores=1 \
--conf spark.driver.memoryOverhead=0 \
--conf spark.executor.memoryOverhead=0 \
--conf spark.driver.memory=1G \
--conf spark.kubernetes.driver.limit.cores=1 \
--conf spark.kubernetes.executor.limit.cores=1 \
--conf spark.kubernetes.executor.limit.memory=1G \
--conf spark.kubernetes.container.image=spark:v3.3.2-volcano \
--conf spark.kubernetes.scheduler.name=volcano \
--conf
spark.kubernetes.scheduler.volcano.podGroupTemplateFile=./podgroup-template.yaml
\
--conf
spark.kubernetes.driver.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep
\
--conf
spark.kubernetes.executor.pod.featureSteps=org.apache.spark.deploy.k8s.features.VolcanoFeatureStep
\
local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar 3000
{code}
4. Watch the yaml information of the driver pod:
{code:bash}
$ kubectl -n spark-test get po -w -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduling.volcano.sh/queue-name: spark-test
creationTimestamp: "2025-06-30T11:03:37Z"
labels:
spark-app-name: spark-pi
spark-app-selector: spark-30fa0ac622a244b1a6e7e4df34b2a274
spark-role: driver
spark-version: 3.3.2
name: spark-pi-f41e8097c081b4f6-driver
namespace: spark-test
resourceVersion: "263622003"
uid: a8dfd10e-deea-43bf-ae51-acdf23a0431f
spec:
containers:
- args:
- driver
- --properties-file
- /opt/spark/conf/spark.properties
- --class
- org.apache.spark.examples.SparkPi
- local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar
- "3000"
image: spark:v3.3.2-volcano
imagePullPolicy: IfNotPresent
name: spark-kubernetes-driver
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "1"
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeSelector: {}
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: volcano
securityContext: {}
serviceAccount: spark
serviceAccountName: spark
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
status:
phase: Pending
qosClass: Guaranteed
---
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduling.k8s.io/group-name: podgroup-a8dfd10e-deea-43bf-ae51-acdf23a0431f
scheduling.volcano.sh/queue-name: spark-test
creationTimestamp: "2025-06-30T11:03:37Z"
labels:
spark-app-name: spark-pi
spark-app-selector: spark-30fa0ac622a244b1a6e7e4df34b2a274
spark-role: driver
spark-version: 3.3.2
name: spark-pi-f41e8097c081b4f6-driver
namespace: spark-test
resourceVersion: "263622010"
uid: a8dfd10e-deea-43bf-ae51-acdf23a0431f
spec:
containers:
- args:
- driver
- --properties-file
- /opt/spark/conf/spark.properties
- --class
- org.apache.spark.examples.SparkPi
- local:///opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar
- "3000"
image: spark:v3.3.2-volcano
imagePullPolicy: IfNotPresent
name: spark-kubernetes-driver
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "1"
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeSelector: {}
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: volcano
securityContext: {}
serviceAccount: spark
serviceAccountName: spark
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
status:
phase: Pending
qosClass: Guaranteed
··· ···
{code}
5. Watch the creation process of podgroup:
{code:bash}
$ kubectl -n spark-test get pg -w
NAME STATUS MINMEMBER RUNNINGS
AGE
spark-30fa0ac622a244b1a6e7e4df34b2a274-podgroup 1
0s
podgroup-a8dfd10e-deea-43bf-ae51-acdf23a0431f Pending 1
0s
spark-30fa0ac622a244b1a6e7e4df34b2a274-podgroup Pending 1
0s
podgroup-a8dfd10e-deea-43bf-ae51-acdf23a0431f Running 1
0s
spark-30fa0ac622a244b1a6e7e4df34b2a274-podgroup 1
0s
spark-30fa0ac622a244b1a6e7e4df34b2a274-podgroup Pending 1
1s
podgroup-a8dfd10e-deea-43bf-ae51-acdf23a0431f Running 1 1
21s
podgroup-a8dfd10e-deea-43bf-ae51-acdf23a0431f Running 1 2
49s
podgroup-a8dfd10e-deea-43bf-ae51-acdf23a0431f Running 1 1
2m23s
podgroup-a8dfd10e-deea-43bf-ae51-acdf23a0431f Running 1 1
2m25s
podgroup-a8dfd10e-deea-43bf-ae51-acdf23a0431f Running 1 1
2m26s
podgroup-a8dfd10e-deea-43bf-ae51-acdf23a0431f Completed 1
2m27s
spark-30fa0ac622a244b1a6e7e4df34b2a274-podgroup Inqueue 1
2m27s
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]