[
https://issues.apache.org/jira/browse/YUNIKORN-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893701#comment-17893701
]
Xiaobao Wu commented on YUNIKORN-2940:
--------------------------------------
Yes, if the ph pod was created fails, it is not a problem that can be solved at
the scheduler level,what can be done is to track this and feedback it
gracefully to the submitter. Thank you for your reply, let me think more about
this problem.
> Handle placeholder pod create failure gracefully
> ------------------------------------------------
>
> Key: YUNIKORN-2940
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2940
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 1.5.2
> Reporter: Xiaobao Wu
> Priority: Critical
> Attachments: Job_screenshots.png
>
>
> *Environment information*
> * resourceQuotas:4C 4G
> * driver / executor Pod:1C 1G
> * driver / executor ph Pod (in task-groups): 1C 1G
> *Issue description*
> In the above environment, I submitted a Spark job, and the job information is
> as follows :
> {code:java}
> /opt/spark/bin/spark-submit --master k8s://https://127.0.0.1:6443
> --deploy-mode cluster --name spark-pi \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.kubernetes.namespace=spark-my-test \
> --class org.apache.spark.examples.SparkPi \
> --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
> --conf spark.dynamicAllocation.enabled=true \
> --conf spark.dynamicAllocation.maxExecutors=10 \
> --conf spark.dynamicAllocation.minExecutors=10 \
> --conf spark.executor.cores=1 \
> --conf spark.executor.memory=600m \
> --conf spark.driver.cores=1 \
> --conf spark.driver.memory=600m \
> --conf spark.app.id={{APP_ID}} \
> --conf spark.ui.port=14040 \
> --conf spark.kubernetes.driver.limit.cores=1 \
> --conf spark.kubernetes.executor.limit.cores=1 \
> --conf spark.kubernetes.container.image=apache/spark:v3.3.0 \
> --conf spark.kubernetes.scheduler.name=yunikorn \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
> \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name":
> "spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory":
> "1Gi"} }, {"name": "spark-executor", "minMember": 10, "minResource": {"cpu":
> "1", "memory": "1Gi"} }]' \
> --conf
> spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters='placeholderTimeoutInSeconds=30
> gangSchedulingStyle=Hard' \
> --conf
> spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name=spark-executor
> \
> local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar 10000 {code}
> After I ran this job, I found that ph Pod ( i.e.
> tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv in the following
> picture ) still exists on K8S.
> !Job_screenshots.png!
> I think it is very strange why the job has been completed, but phPod still
> exists.
>
> *Issue analysis*
> By looking at the log, I found that there is a key log here:
>
> {code:java}
> 2024-10-21T20:50:19.868+0800 INFO shim.cache.placeholder
> cache/placeholder_manager.go:99 placeholder created {"placeholder":
> "appID: spark-96aae620780e4b40a59893d850e8aad3, taskGroup: spark-executor,
> podName:
> spark-my-test/tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv"}
> 2024-10-21T20:50:19.880+0800 ERROR shim.cache.placeholder
> cache/placeholder_manager.go:95 failed to create placeholder pod
> {"error": "pods \"tg-spark-96aae620780e4b40a59893-spark-executor-zkpqzmw308\"
> is forbidden: exceeded quota: compute-resources, requested:
> limits.cpu=1,limits.memory=1Gi,requests.cpu=1,requests.memory=1Gi, used:
> limits.cpu=4,limits.memory=4056Mi,requests.cpu=4,requests.memory=4056Mi,
> limited: limits.cpu=4,limits.memory=4Gi,requests.cpu=4,requests.memory=4Gi"}
> github.com/apache/yunikorn-k8shim/pkg/cache.(*PlaceholderManager).createAppPlaceholders
> /opt/src/pkg/cache/placeholder_manager.go:95
> github.com/apache/yunikorn-k8shim/pkg/cache.(*Application).onReserving.func1
> /opt/src/pkg/cache/application.go:537
> 2024-10-21T20:50:19.880+0800 INFO shim.cache.placeholder
> cache/placeholder_manager.go:111 start to clean up app placeholders
> {"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
> 2024-10-21T20:50:19.973+0800 DEBUG shim.context
> cache/context.go:1109 AddTask {"appID":
> "spark-96aae620780e4b40a59893d850e8aad3", "taskID":
> "08490d01-bb3b-490b-a9b0-b9bd183cccd6"}
> 2024-10-21T20:50:19.973+0800 INFO shim.context cache/context.go:1131
> task added {"appID": "spark-96aae620780e4b40a59893d850e8aad3",
> "taskID": "08490d01-bb3b-490b-a9b0-b9bd183cccd6", "taskState": "New"}
> 2024-10-21T20:50:20.058+0800 INFO shim.cache.placeholder
> cache/placeholder_manager.go:124 finished cleaning up app placeholders
> {"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
> 2024-10-21T20:50:20.058+0800 DEBUG shim.fsm
> cache/application_state.go:500 shim app state transition {"app":
> "spark-96aae620780e4b40a59893d850e8aad3", "source": "Reserving",
> "destination": "Running", "event": "UpdateReservation"}
> {code}
> According to the above log, the sequence of events is as follows :
> 1. Create ph Pod ( tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
> 2. Exceptions in the createAppPlaceholders method due to resourceQuotas
> constraints
> 3. *start to clean up app placeholders*
> 4. Add task ( this task corresponds to
> tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
> 5. finished cleaning up app placeholders
> 6. shim app status is transformed into Running
>
> *Stage conclusion*
> # When an exception ( e.g., resourceQuotas limit, etc. ) is encountered
> during the creation of a ph Pod, it triggers the ph task cleanup mechanism.
> # The cleanup mechanism for the ph Task deletes all tasks under the current
> app.
> # If a Task is added during the cleanup process, it will not be cleaned.
> # The shim app will no longer handle ph Task after the state is converted to
> Running ( this "handle" means pushing the InitTask event to the ph Task )
> In short, in this scenario, it may cause this ph Pod long time Pending
> problem.
>
> *Follow-up improvement ideas*
> 1. Whether we could clean up the ph Task after the core app is completed?
> 2. Whether we could clean up the ph Task after the shim app turns to Running?
> 3. When the shim app state is converted to Running, we no longer accept the
> addTask request of ph Pod, and feedback it to the submiter in the form of
> logs or events.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]