[ 
https://issues.apache.org/jira/browse/YUNIKORN-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobao Wu updated YUNIKORN-2940:
---------------------------------
    Issue Type: Improvement  (was: Bug)

> Handle placeholder pod create failure gracefully
> ------------------------------------------------
>
>                 Key: YUNIKORN-2940
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2940
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>    Affects Versions: 1.5.2
>            Reporter: Xiaobao Wu
>            Priority: Critical
>         Attachments: Job_screenshots.png
>
>
> *Environment information*
>  * resourceQuotas:4C 4G
>  * driver / executor Pod:1C 1G
>  * driver /  executor ph Pod (in task-groups): 1C 1G
> *Issue description*
> In the above environment, I submitted a Spark job, and the job information is 
> as follows :
> {code:java}
> /opt/spark/bin/spark-submit --master k8s://https://127.0.0.1:6443 
> --deploy-mode cluster --name spark-pi \
>   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>   --conf spark.kubernetes.namespace=spark-my-test  \
>   --class org.apache.spark.examples.SparkPi \
>   --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
>   --conf spark.dynamicAllocation.enabled=true \
>   --conf spark.dynamicAllocation.maxExecutors=10 \
>   --conf spark.dynamicAllocation.minExecutors=10 \
>   --conf spark.executor.cores=1 \
>   --conf spark.executor.memory=600m \
>   --conf spark.driver.cores=1 \
>   --conf spark.driver.memory=600m \
>   --conf spark.app.id={{APP_ID}} \
>   --conf spark.ui.port=14040 \
>   --conf spark.kubernetes.driver.limit.cores=1 \
>   --conf spark.kubernetes.executor.limit.cores=1 \
>   --conf spark.kubernetes.container.image=apache/spark:v3.3.0 \
>   --conf spark.kubernetes.scheduler.name=yunikorn \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
>  \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name": 
> "spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory": 
> "1Gi"} }, {"name": "spark-executor", "minMember": 10, "minResource": {"cpu": 
> "1", "memory": "1Gi"} }]' \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters='placeholderTimeoutInSeconds=30
>  gangSchedulingStyle=Hard' \
>   --conf 
> spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name=spark-executor
>  \
>   local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar  10000 {code}
> After I ran this job, I found that ph Pod ( i.e. 
> tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv in the following 
> picture ) still exists on K8S.
> !Job_screenshots.png!
> I think it is very strange why the job has been completed, but phPod still 
> exists.
>  
> *Issue analysis*
> By looking at the log, I found that there is a key log here:
>  
> {code:java}
> 2024-10-21T20:50:19.868+0800    INFO    shim.cache.placeholder    
> cache/placeholder_manager.go:99    placeholder created    {"placeholder": 
> "appID: spark-96aae620780e4b40a59893d850e8aad3, taskGroup: spark-executor, 
> podName: 
> spark-my-test/tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv"}
> 2024-10-21T20:50:19.880+0800    ERROR    shim.cache.placeholder    
> cache/placeholder_manager.go:95    failed to create placeholder pod    
> {"error": "pods \"tg-spark-96aae620780e4b40a59893-spark-executor-zkpqzmw308\" 
> is forbidden: exceeded quota: compute-resources, requested: 
> limits.cpu=1,limits.memory=1Gi,requests.cpu=1,requests.memory=1Gi, used: 
> limits.cpu=4,limits.memory=4056Mi,requests.cpu=4,requests.memory=4056Mi, 
> limited: limits.cpu=4,limits.memory=4Gi,requests.cpu=4,requests.memory=4Gi"}
> github.com/apache/yunikorn-k8shim/pkg/cache.(*PlaceholderManager).createAppPlaceholders
>     /opt/src/pkg/cache/placeholder_manager.go:95
> github.com/apache/yunikorn-k8shim/pkg/cache.(*Application).onReserving.func1
>     /opt/src/pkg/cache/application.go:537
> 2024-10-21T20:50:19.880+0800    INFO    shim.cache.placeholder    
> cache/placeholder_manager.go:111    start to clean up app placeholders    
> {"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
> 2024-10-21T20:50:19.973+0800    DEBUG    shim.context    
> cache/context.go:1109    AddTask    {"appID": 
> "spark-96aae620780e4b40a59893d850e8aad3", "taskID": 
> "08490d01-bb3b-490b-a9b0-b9bd183cccd6"}
> 2024-10-21T20:50:19.973+0800    INFO    shim.context    cache/context.go:1131 
>    task added    {"appID": "spark-96aae620780e4b40a59893d850e8aad3", 
> "taskID": "08490d01-bb3b-490b-a9b0-b9bd183cccd6", "taskState": "New"}
> 2024-10-21T20:50:20.058+0800    INFO    shim.cache.placeholder    
> cache/placeholder_manager.go:124    finished cleaning up app placeholders    
> {"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
> 2024-10-21T20:50:20.058+0800    DEBUG    shim.fsm    
> cache/application_state.go:500    shim app state transition    {"app": 
> "spark-96aae620780e4b40a59893d850e8aad3", "source": "Reserving", 
> "destination": "Running", "event": "UpdateReservation"}
> {code}
> According to the above log, the sequence of events is as follows :
> 1. Create ph Pod ( tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
> 2. Exceptions in the createAppPlaceholders method due to resourceQuotas 
> constraints
> 3. *start to clean up app placeholders*
> 4. Add task ( this task corresponds to 
> tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
> 5. finished cleaning up app placeholders
> 6. shim app status is transformed into Running
>  
> *Stage conclusion*
>  # When an exception ( e.g., resourceQuotas limit, etc. ) is encountered 
> during the creation of a ph Pod, it triggers the ph task cleanup mechanism.
>  # The cleanup mechanism for the ph Task deletes all tasks under the current 
> app.
>  # If a Task is added during the cleanup process, it will not be cleaned.
>  # The shim app will no longer handle ph Task after the state is converted to 
> Running ( this "handle" means pushing the InitTask event to the ph Task )
>     In short, in this scenario, it may cause this ph Pod long time Pending 
> problem.
>  
> *Follow-up improvement ideas*
> 1. Whether we could clean up the ph Task after the core app is completed?
> 2. Whether we could clean up the ph Task after the shim app turns to Running?
> 3. When the shim app state is converted to Running, we no longer accept the 
> addTask request of ph Pod, and feedback it to the submiter in the form of 
> logs or events.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to