[jira] [Commented] (YUNIKORN-2940) Handle placeholder pod create failure gracefully

Xiaobao Wu (Jira) Mon, 28 Oct 2024 23:18:07 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893701#comment-17893701
 ]


Xiaobao Wu commented on YUNIKORN-2940:
--------------------------------------

Yes, if the ph pod was created fails, it is not a problem that can be solved at 
the scheduler level，what can be done is to track this and feedback it 
gracefully to the submitter. Thank you for your reply, let me think more about 
this problem.

> Handle placeholder pod create failure gracefully
> ------------------------------------------------
>
>                 Key: YUNIKORN-2940
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2940
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>    Affects Versions: 1.5.2
>            Reporter: Xiaobao Wu
>            Priority: Critical
>         Attachments: Job_screenshots.png
>
>
> *Environment information*
>  * resourceQuotas：4C 4G
>  * driver / executor Pod：1C 1G
>  * driver /  executor ph Pod (in task-groups): 1C 1G
> *Issue description*
> In the above environment, I submitted a Spark job, and the job information is 
> as follows :
> {code:java}
> /opt/spark/bin/spark-submit --master k8s://https://127.0.0.1:6443 
> --deploy-mode cluster --name spark-pi \
>   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>   --conf spark.kubernetes.namespace=spark-my-test  \
>   --class org.apache.spark.examples.SparkPi \
>   --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
>   --conf spark.dynamicAllocation.enabled=true \
>   --conf spark.dynamicAllocation.maxExecutors=10 \
>   --conf spark.dynamicAllocation.minExecutors=10 \
>   --conf spark.executor.cores=1 \
>   --conf spark.executor.memory=600m \
>   --conf spark.driver.cores=1 \
>   --conf spark.driver.memory=600m \
>   --conf spark.app.id={{APP_ID}} \
>   --conf spark.ui.port=14040 \
>   --conf spark.kubernetes.driver.limit.cores=1 \
>   --conf spark.kubernetes.executor.limit.cores=1 \
>   --conf spark.kubernetes.container.image=apache/spark:v3.3.0 \
>   --conf spark.kubernetes.scheduler.name=yunikorn \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
>  \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name": 
> "spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory": 
> "1Gi"} }, {"name": "spark-executor", "minMember": 10, "minResource": {"cpu": 
> "1", "memory": "1Gi"} }]' \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters='placeholderTimeoutInSeconds=30
>  gangSchedulingStyle=Hard' \
>   --conf 
> spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name=spark-executor
>  \
>   local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar  10000 {code}
> After I ran this job, I found that ph Pod ( i.e. 
> tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv in the following 
> picture ) still exists on K8S.
> !Job_screenshots.png!
> I think it is very strange why the job has been completed, but phPod still 
> exists.
>  
> *Issue analysis*
> By looking at the log, I found that there is a key log here：
>  
> {code:java}
> 2024-10-21T20:50:19.868+0800    INFO    shim.cache.placeholder    
> cache/placeholder_manager.go:99    placeholder created    {"placeholder": 
> "appID: spark-96aae620780e4b40a59893d850e8aad3, taskGroup: spark-executor, 
> podName: 
> spark-my-test/tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv"}
> 2024-10-21T20:50:19.880+0800    ERROR    shim.cache.placeholder    
> cache/placeholder_manager.go:95    failed to create placeholder pod    
> {"error": "pods \"tg-spark-96aae620780e4b40a59893-spark-executor-zkpqzmw308\" 
> is forbidden: exceeded quota: compute-resources, requested: 
> limits.cpu=1,limits.memory=1Gi,requests.cpu=1,requests.memory=1Gi, used: 
> limits.cpu=4,limits.memory=4056Mi,requests.cpu=4,requests.memory=4056Mi, 
> limited: limits.cpu=4,limits.memory=4Gi,requests.cpu=4,requests.memory=4Gi"}
> github.com/apache/yunikorn-k8shim/pkg/cache.(*PlaceholderManager).createAppPlaceholders
>     /opt/src/pkg/cache/placeholder_manager.go:95
> github.com/apache/yunikorn-k8shim/pkg/cache.(*Application).onReserving.func1
>     /opt/src/pkg/cache/application.go:537
> 2024-10-21T20:50:19.880+0800    INFO    shim.cache.placeholder    
> cache/placeholder_manager.go:111    start to clean up app placeholders    
> {"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
> 2024-10-21T20:50:19.973+0800    DEBUG    shim.context    
> cache/context.go:1109    AddTask    {"appID": 
> "spark-96aae620780e4b40a59893d850e8aad3", "taskID": 
> "08490d01-bb3b-490b-a9b0-b9bd183cccd6"}
> 2024-10-21T20:50:19.973+0800    INFO    shim.context    cache/context.go:1131 
>    task added    {"appID": "spark-96aae620780e4b40a59893d850e8aad3", 
> "taskID": "08490d01-bb3b-490b-a9b0-b9bd183cccd6", "taskState": "New"}
> 2024-10-21T20:50:20.058+0800    INFO    shim.cache.placeholder    
> cache/placeholder_manager.go:124    finished cleaning up app placeholders    
> {"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
> 2024-10-21T20:50:20.058+0800    DEBUG    shim.fsm    
> cache/application_state.go:500    shim app state transition    {"app": 
> "spark-96aae620780e4b40a59893d850e8aad3", "source": "Reserving", 
> "destination": "Running", "event": "UpdateReservation"}
> {code}
> According to the above log, the sequence of events is as follows :
> 1. Create ph Pod ( tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
> 2. Exceptions in the createAppPlaceholders method due to resourceQuotas 
> constraints
> 3. *start to clean up app placeholders*
> 4. Add task ( this task corresponds to 
> tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
> 5. finished cleaning up app placeholders
> 6. shim app status is transformed into Running
>  
> *Stage conclusion*
>  # When an exception ( e.g., resourceQuotas limit, etc. ) is encountered 
> during the creation of a ph Pod, it triggers the ph task cleanup mechanism.
>  # The cleanup mechanism for the ph Task deletes all tasks under the current 
> app.
>  # If a Task is added during the cleanup process, it will not be cleaned.
>  # The shim app will no longer handle ph Task after the state is converted to 
> Running ( this "handle" means pushing the InitTask event to the ph Task )
>     In short, in this scenario, it may cause this ph Pod long time Pending 
> problem.
>  
> *Follow-up improvement ideas*
> 1. Whether we could clean up the ph Task after the core app is completed?
> 2. Whether we could clean up the ph Task after the shim app turns to Running?
> 3. When the shim app state is converted to Running, we no longer accept the 
> addTask request of ph Pod, and feedback it to the submiter in the form of 
> logs or events.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-2940) Handle placeholder pod create failure gracefully

Reply via email to