[ 
https://issues.apache.org/jira/browse/YUNIKORN-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobao Wu updated YUNIKORN-2940:
---------------------------------
    Description: 
*Environment information*
 * resourceQuotas:4C 4G
 * driver / executor Pod:1C 1G
 * driver /  executor ph Pod (in task-groups): 1C 1G

*Issue description*

In the above environment, I submitted a Spark job, and the job information is 
as follows :
{code:java}
/opt/spark/bin/spark-submit --master k8s://https://127.0.0.1:6443 --deploy-mode 
cluster --name spark-pi \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --conf spark.kubernetes.namespace=spark-my-test  \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
  --conf spark.dynamicAllocation.enabled=true \
  --conf spark.dynamicAllocation.maxExecutors=10 \
  --conf spark.dynamicAllocation.minExecutors=10 \
  --conf spark.executor.cores=1 \
  --conf spark.executor.memory=600m \
  --conf spark.driver.cores=1 \
  --conf spark.driver.memory=600m \
  --conf spark.app.id={{APP_ID}} \
  --conf spark.ui.port=14040 \
  --conf spark.kubernetes.driver.limit.cores=1 \
  --conf spark.kubernetes.executor.limit.cores=1 \
  --conf spark.kubernetes.container.image=apache/spark:v3.3.0 \
  --conf spark.kubernetes.scheduler.name=yunikorn \
  --conf 
spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
 \
  --conf 
spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name": 
"spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory": 
"1Gi"} }, {"name": "spark-executor", "minMember": 10, "minResource": {"cpu": 
"1", "memory": "1Gi"} }]' \
  --conf 
spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters='placeholderTimeoutInSeconds=30
 gangSchedulingStyle=Hard' \
  --conf 
spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name=spark-executor
 \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar  10000 {code}
After I ran this job, I found that ph Pod ( i.e. 
tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv in the following 
picture ) still exists on K8S.
!http://www.kdocs.cn/api/v3/office/copy/NjNBZFlyNDdCMXRSZEp0cEdTYVVocE94MkY3OVZzTHNMM2oyWFQ0ZVY2K0x6eE9qNTNDMDFzN3Z3QzA1ZCtrdEdwbU9FNm5xUHI4cTZKTDV6dnYvWHFjZUZlYjJMZS9UYXBrMERSVWYxNkhhd0pycnNkVEtxblh0d212K3dQZ0o0eXB6VWVEanlJbTRnSGlYVG12YUNrbS9tRnZzMkNneU82aGNWZzNIYmNVQmlnbmlVZ0VNS1lJZ0NNQzBSKzYwbGJ5SVd5MXFwSjhZUFllb2Rwc0Q1UCtwMlh4WkljSWxQN2FEczVBODhRdk5pSlVOcVllZVNjaklWVGFhQ0paaC9DZUpXS1hDRldrPQ==/attach/object/4BQON4Y3AAADQ?|width=664!

I think it is very strange why the job has been completed, but phPod still 
exists.
 
*Issue analysis*
By looking at the log, I found that there is a key log here:
 
{code:java}
2024-10-21T20:50:19.868+0800    INFO    shim.cache.placeholder    
cache/placeholder_manager.go:99    placeholder created    {"placeholder": 
"appID: spark-96aae620780e4b40a59893d850e8aad3, taskGroup: spark-executor, 
podName: 
spark-my-test/tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv"}
2024-10-21T20:50:19.880+0800    ERROR    shim.cache.placeholder    
cache/placeholder_manager.go:95    failed to create placeholder pod    
{"error": "pods \"tg-spark-96aae620780e4b40a59893-spark-executor-zkpqzmw308\" 
is forbidden: exceeded quota: compute-resources, requested: 
limits.cpu=1,limits.memory=1Gi,requests.cpu=1,requests.memory=1Gi, used: 
limits.cpu=4,limits.memory=4056Mi,requests.cpu=4,requests.memory=4056Mi, 
limited: limits.cpu=4,limits.memory=4Gi,requests.cpu=4,requests.memory=4Gi"}
github.com/apache/yunikorn-k8shim/pkg/cache.(*PlaceholderManager).createAppPlaceholders
    /opt/src/pkg/cache/placeholder_manager.go:95
github.com/apache/yunikorn-k8shim/pkg/cache.(*Application).onReserving.func1
    /opt/src/pkg/cache/application.go:537
2024-10-21T20:50:19.880+0800    INFO    shim.cache.placeholder    
cache/placeholder_manager.go:111    start to clean up app placeholders    
{"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
2024-10-21T20:50:19.973+0800    DEBUG    shim.context    cache/context.go:1109  
  AddTask    {"appID": "spark-96aae620780e4b40a59893d850e8aad3", "taskID": 
"08490d01-bb3b-490b-a9b0-b9bd183cccd6"}
2024-10-21T20:50:19.973+0800    INFO    shim.context    cache/context.go:1131   
 task added    {"appID": "spark-96aae620780e4b40a59893d850e8aad3", "taskID": 
"08490d01-bb3b-490b-a9b0-b9bd183cccd6", "taskState": "New"}
2024-10-21T20:50:20.058+0800    INFO    shim.cache.placeholder    
cache/placeholder_manager.go:124    finished cleaning up app placeholders    
{"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
2024-10-21T20:50:20.058+0800    DEBUG    shim.fsm    
cache/application_state.go:500    shim app state transition    {"app": 
"spark-96aae620780e4b40a59893d850e8aad3", "source": "Reserving", "destination": 
"Running", "event": "UpdateReservation"}
{code}
According to the above log, the sequence of events is as follows :
1. Create ph Pod ( tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
2. Exceptions in the createAppPlaceholders method due to resourceQuotas 
constraints
3. *start to clean up app placeholders*
4. Add task ( this task corresponds to 
tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
5. finished cleaning up app placeholders
6. shim app status is transformed into Running
 

*Stage conclusion*
 # When an exception ( e.g., resourceQuotas limit, etc. ) is encountered during 
the creation of a ph Pod, it triggers the ph task cleanup mechanism.
 # The cleanup mechanism for the ph Task deletes all tasks under the current 
app.
 # If a Task is added during the cleanup process, it will not be cleaned.
 # The shim app will no longer handle ph Task after the state is converted to 
Running ( this "handle" means pushing the InitTask event to the ph Task )

    In short, in this scenario, it may cause this ph Pod long time Pending 
problem.

 

*Follow-up improvement ideas*

1. Whether we could clean up the ph Task after the core app is completed?

2. Whether we could clean up the ph Task after the shim app turns to Running?

3. When the shim app state is converted to Running, we no longer accept the 
addTask request of ph Pod, and feedback it to the submiter in the form of logs 
or events.

  was:
*Environment information*
 * resourceQuotas:4C 4G
 * driver / executor Pod:1C 1G
 * driver /  executor ph Pod (in task-groups): 1C 1G

*Issue description*

In the above environment, I submitted a Spark job, and the job information is 
as follows :
{code:java}
/opt/spark/bin/spark-submit --master k8s://https://127.0.0.1:6443 --deploy-mode 
cluster --name spark-pi \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --conf spark.kubernetes.namespace=spark-my-test  \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
  --conf spark.dynamicAllocation.enabled=true \
  --conf spark.dynamicAllocation.maxExecutors=10 \
  --conf spark.dynamicAllocation.minExecutors=10 \
  --conf spark.executor.cores=1 \
  --conf spark.executor.memory=600m \
  --conf spark.driver.cores=1 \
  --conf spark.driver.memory=600m \
  --conf spark.app.id={{APP_ID}} \
  --conf spark.ui.port=14040 \
  --conf spark.kubernetes.driver.limit.cores=1 \
  --conf spark.kubernetes.executor.limit.cores=1 \
  --conf spark.kubernetes.container.image=apache/spark:v3.3.0 \
  --conf spark.kubernetes.scheduler.name=yunikorn \
  --conf 
spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
 \
  --conf 
spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name": 
"spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory": 
"1Gi"} }, {"name": "spark-executor", "minMember": 10, "minResource": {"cpu": 
"1", "memory": "1Gi"} }]' \
  --conf 
spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters='placeholderTimeoutInSeconds=30
 gangSchedulingStyle=Hard' \
  --conf 
spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name=spark-executor
 \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar  10000 {code}
After I ran this job, I found that ph Pod ( i.e. 
tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv in the following 
picture ) still exists on K8S.
!http://www.kdocs.cn/api/v3/office/copy/NjNBZFlyNDdCMXRSZEp0cEdTYVVocE94MkY3OVZzTHNMM2oyWFQ0ZVY2K0x6eE9qNTNDMDFzN3Z3QzA1ZCtrdEdwbU9FNm5xUHI4cTZKTDV6dnYvWHFjZUZlYjJMZS9UYXBrMERSVWYxNkhhd0pycnNkVEtxblh0d212K3dQZ0o0eXB6VWVEanlJbTRnSGlYVG12YUNrbS9tRnZzMkNneU82aGNWZzNIYmNVQmlnbmlVZ0VNS1lJZ0NNQzBSKzYwbGJ5SVd5MXFwSjhZUFllb2Rwc0Q1UCtwMlh4WkljSWxQN2FEczVBODhRdk5pSlVOcVllZVNjaklWVGFhQ0paaC9DZUpXS1hDRldrPQ==/attach/object/4BQON4Y3AAADQ?|width=664!
I think it is very strange why the job has been completed, but phPod still 
exists.
 
*Issue analysis*
By looking at the log, I found that there is a key log here:
 
{code:java}
2024-10-21T20:50:19.868+0800    INFO    shim.cache.placeholder    
cache/placeholder_manager.go:99    placeholder created    {"placeholder": 
"appID: spark-96aae620780e4b40a59893d850e8aad3, taskGroup: spark-executor, 
podName: 
spark-my-test/tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv"}
2024-10-21T20:50:19.880+0800    ERROR    shim.cache.placeholder    
cache/placeholder_manager.go:95    failed to create placeholder pod    
{"error": "pods \"tg-spark-96aae620780e4b40a59893-spark-executor-zkpqzmw308\" 
is forbidden: exceeded quota: compute-resources, requested: 
limits.cpu=1,limits.memory=1Gi,requests.cpu=1,requests.memory=1Gi, used: 
limits.cpu=4,limits.memory=4056Mi,requests.cpu=4,requests.memory=4056Mi, 
limited: limits.cpu=4,limits.memory=4Gi,requests.cpu=4,requests.memory=4Gi"}
github.com/apache/yunikorn-k8shim/pkg/cache.(*PlaceholderManager).createAppPlaceholders
    /opt/src/pkg/cache/placeholder_manager.go:95
github.com/apache/yunikorn-k8shim/pkg/cache.(*Application).onReserving.func1
    /opt/src/pkg/cache/application.go:537
2024-10-21T20:50:19.880+0800    INFO    shim.cache.placeholder    
cache/placeholder_manager.go:111    start to clean up app placeholders    
{"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
2024-10-21T20:50:19.973+0800    DEBUG    shim.context    cache/context.go:1109  
  AddTask    {"appID": "spark-96aae620780e4b40a59893d850e8aad3", "taskID": 
"08490d01-bb3b-490b-a9b0-b9bd183cccd6"}
2024-10-21T20:50:19.973+0800    INFO    shim.context    cache/context.go:1131   
 task added    {"appID": "spark-96aae620780e4b40a59893d850e8aad3", "taskID": 
"08490d01-bb3b-490b-a9b0-b9bd183cccd6", "taskState": "New"}
2024-10-21T20:50:20.058+0800    INFO    shim.cache.placeholder    
cache/placeholder_manager.go:124    finished cleaning up app placeholders    
{"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
2024-10-21T20:50:20.058+0800    DEBUG    shim.fsm    
cache/application_state.go:500    shim app state transition    {"app": 
"spark-96aae620780e4b40a59893d850e8aad3", "source": "Reserving", "destination": 
"Running", "event": "UpdateReservation"}
{code}
According to the above log, the sequence of events is as follows :
1. Create ph Pod ( tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
2. Exceptions in the createAppPlaceholders method due to resourceQuotas 
constraints
3. *start to clean up app placeholders*
4. Add task ( this task corresponds to 
tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
5. finished cleaning up app placeholders
6. shim app status is transformed into Running
 

*Stage conclusion*
 # When an exception ( e.g., resourceQuotas limit, etc. ) is encountered during 
the creation of a ph Pod, it triggers the ph task cleanup mechanism.
 # The cleanup mechanism for the ph Task deletes all tasks under the current 
app.
 # If a Task is added during the cleanup process, it will not be cleaned.
 # The shim app will no longer handle ph Task after the state is converted to 
Running ( this "handle" means pushing the InitTask event to the ph Task )

    In short, in this scenario, it may cause this ph Pod long time Pending 
problem.

 

*Follow-up improvement ideas*

1. Whether we could clean up the ph Task after the core app is completed?

2. Whether we could clean up the ph Task after the shim app turns to Running?

3. When the shim app state is converted to Running, we no longer accept the 
addTask request of ph Pod, and feedback it to the submiter in the form of logs 
or events.


> ph Pod is in a pending state for a long time
> --------------------------------------------
>
>                 Key: YUNIKORN-2940
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2940
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>    Affects Versions: 1.5.2
>            Reporter: Xiaobao Wu
>            Priority: Critical
>
> *Environment information*
>  * resourceQuotas:4C 4G
>  * driver / executor Pod:1C 1G
>  * driver /  executor ph Pod (in task-groups): 1C 1G
> *Issue description*
> In the above environment, I submitted a Spark job, and the job information is 
> as follows :
> {code:java}
> /opt/spark/bin/spark-submit --master k8s://https://127.0.0.1:6443 
> --deploy-mode cluster --name spark-pi \
>   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>   --conf spark.kubernetes.namespace=spark-my-test  \
>   --class org.apache.spark.examples.SparkPi \
>   --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
>   --conf spark.dynamicAllocation.enabled=true \
>   --conf spark.dynamicAllocation.maxExecutors=10 \
>   --conf spark.dynamicAllocation.minExecutors=10 \
>   --conf spark.executor.cores=1 \
>   --conf spark.executor.memory=600m \
>   --conf spark.driver.cores=1 \
>   --conf spark.driver.memory=600m \
>   --conf spark.app.id={{APP_ID}} \
>   --conf spark.ui.port=14040 \
>   --conf spark.kubernetes.driver.limit.cores=1 \
>   --conf spark.kubernetes.executor.limit.cores=1 \
>   --conf spark.kubernetes.container.image=apache/spark:v3.3.0 \
>   --conf spark.kubernetes.scheduler.name=yunikorn \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name=spark-driver
>  \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{"name": 
> "spark-driver", "minMember": 1, "minResource": {"cpu": "1", "memory": 
> "1Gi"} }, {"name": "spark-executor", "minMember": 10, "minResource": {"cpu": 
> "1", "memory": "1Gi"} }]' \
>   --conf 
> spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters='placeholderTimeoutInSeconds=30
>  gangSchedulingStyle=Hard' \
>   --conf 
> spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name=spark-executor
>  \
>   local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar  10000 {code}
> After I ran this job, I found that ph Pod ( i.e. 
> tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv in the following 
> picture ) still exists on K8S.
> !http://www.kdocs.cn/api/v3/office/copy/NjNBZFlyNDdCMXRSZEp0cEdTYVVocE94MkY3OVZzTHNMM2oyWFQ0ZVY2K0x6eE9qNTNDMDFzN3Z3QzA1ZCtrdEdwbU9FNm5xUHI4cTZKTDV6dnYvWHFjZUZlYjJMZS9UYXBrMERSVWYxNkhhd0pycnNkVEtxblh0d212K3dQZ0o0eXB6VWVEanlJbTRnSGlYVG12YUNrbS9tRnZzMkNneU82aGNWZzNIYmNVQmlnbmlVZ0VNS1lJZ0NNQzBSKzYwbGJ5SVd5MXFwSjhZUFllb2Rwc0Q1UCtwMlh4WkljSWxQN2FEczVBODhRdk5pSlVOcVllZVNjaklWVGFhQ0paaC9DZUpXS1hDRldrPQ==/attach/object/4BQON4Y3AAADQ?|width=664!
> I think it is very strange why the job has been completed, but phPod still 
> exists.
>  
> *Issue analysis*
> By looking at the log, I found that there is a key log here:
>  
> {code:java}
> 2024-10-21T20:50:19.868+0800    INFO    shim.cache.placeholder    
> cache/placeholder_manager.go:99    placeholder created    {"placeholder": 
> "appID: spark-96aae620780e4b40a59893d850e8aad3, taskGroup: spark-executor, 
> podName: 
> spark-my-test/tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv"}
> 2024-10-21T20:50:19.880+0800    ERROR    shim.cache.placeholder    
> cache/placeholder_manager.go:95    failed to create placeholder pod    
> {"error": "pods \"tg-spark-96aae620780e4b40a59893-spark-executor-zkpqzmw308\" 
> is forbidden: exceeded quota: compute-resources, requested: 
> limits.cpu=1,limits.memory=1Gi,requests.cpu=1,requests.memory=1Gi, used: 
> limits.cpu=4,limits.memory=4056Mi,requests.cpu=4,requests.memory=4056Mi, 
> limited: limits.cpu=4,limits.memory=4Gi,requests.cpu=4,requests.memory=4Gi"}
> github.com/apache/yunikorn-k8shim/pkg/cache.(*PlaceholderManager).createAppPlaceholders
>     /opt/src/pkg/cache/placeholder_manager.go:95
> github.com/apache/yunikorn-k8shim/pkg/cache.(*Application).onReserving.func1
>     /opt/src/pkg/cache/application.go:537
> 2024-10-21T20:50:19.880+0800    INFO    shim.cache.placeholder    
> cache/placeholder_manager.go:111    start to clean up app placeholders    
> {"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
> 2024-10-21T20:50:19.973+0800    DEBUG    shim.context    
> cache/context.go:1109    AddTask    {"appID": 
> "spark-96aae620780e4b40a59893d850e8aad3", "taskID": 
> "08490d01-bb3b-490b-a9b0-b9bd183cccd6"}
> 2024-10-21T20:50:19.973+0800    INFO    shim.context    cache/context.go:1131 
>    task added    {"appID": "spark-96aae620780e4b40a59893d850e8aad3", 
> "taskID": "08490d01-bb3b-490b-a9b0-b9bd183cccd6", "taskState": "New"}
> 2024-10-21T20:50:20.058+0800    INFO    shim.cache.placeholder    
> cache/placeholder_manager.go:124    finished cleaning up app placeholders    
> {"appID": "spark-96aae620780e4b40a59893d850e8aad3"}
> 2024-10-21T20:50:20.058+0800    DEBUG    shim.fsm    
> cache/application_state.go:500    shim app state transition    {"app": 
> "spark-96aae620780e4b40a59893d850e8aad3", "source": "Reserving", 
> "destination": "Running", "event": "UpdateReservation"}
> {code}
> According to the above log, the sequence of events is as follows :
> 1. Create ph Pod ( tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
> 2. Exceptions in the createAppPlaceholders method due to resourceQuotas 
> constraints
> 3. *start to clean up app placeholders*
> 4. Add task ( this task corresponds to 
> tg-spark-96aae620780e4b40a59893-spark-executor-90qrmfkytv )
> 5. finished cleaning up app placeholders
> 6. shim app status is transformed into Running
>  
> *Stage conclusion*
>  # When an exception ( e.g., resourceQuotas limit, etc. ) is encountered 
> during the creation of a ph Pod, it triggers the ph task cleanup mechanism.
>  # The cleanup mechanism for the ph Task deletes all tasks under the current 
> app.
>  # If a Task is added during the cleanup process, it will not be cleaned.
>  # The shim app will no longer handle ph Task after the state is converted to 
> Running ( this "handle" means pushing the InitTask event to the ph Task )
>     In short, in this scenario, it may cause this ph Pod long time Pending 
> problem.
>  
> *Follow-up improvement ideas*
> 1. Whether we could clean up the ph Task after the core app is completed?
> 2. Whether we could clean up the ph Task after the shim app turns to Running?
> 3. When the shim app state is converted to Running, we no longer accept the 
> addTask request of ph Pod, and feedback it to the submiter in the form of 
> logs or events.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to