[
https://issues.apache.org/jira/browse/YUNIKORN-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chaoran Yu updated YUNIKORN-1596:
---------------------------------
Affects Version/s: 1.2.0
> Pods marked unschedulable when dynamic PVC times out
> ----------------------------------------------------
>
> Key: YUNIKORN-1596
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1596
> Project: Apache YuniKorn
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Praveen
> Priority: Major
>
> We are seeing a behavior when a scheduled pod requesting for PVC times out,
> its marked as unschedulable. There are no retries on such pod and remain in
> 'pending' state. With pods in pending, autoscaler does not scale down nodes.
> This seems similar to issue discussed here:
> [https://github.com/kubernetes/autoscaler/issues/3409]
>
> {quote}Error from Yunikorn logs :
> ERROR cache/context.go:527 Failed to bind pod volumes \{"podName":
> "<PODNAME>", "nodeName": "<IP>", "dynamicProvisions": 1, "staticBindings": 0}
> ...
> ...
> /workspace/pkg/cache/task.go:382
> 2023-02-20T00:02:22.368Z ERROR cache/task.go:265 task failed \{"appID":
> "<APPID>", "taskID": "45981d91-e543-459b-9657-bdc03b57e26f", "reason": "bind
> pod volumes failed, name: <NS/PODNAME>, binding volumes: timed out waiting
> for the condition”}
> {{}}
> {quote}
>
> {{From Autoscalar logs}}
> {quote}I0220 20:47:01.775653 1 static_autoscaler.go:502] Scale down status:
> unneededOnly=true lastScaleUpTime=2023-02-20 19:20:56.429598603 +0000 UTC
> m=+249612.380355315 lastScaleDownDeleteTime=2023-02-20 06:36:50.929515212
> +0000 UTC m=+203766.880271921 lastScaleDownFailTime=2023-02-17
> 22:01:33.693397034 +0000 UTC m=+49.644153730 scaleDownForbidden=true
> isDeleteInProgress=false scaleDownInCooldown=true
> I0220 20:47:11.787999 1 static_autoscaler.go:228] Starting main loop
> I0220 20:47:11.792789 1 filter_out_schedulable.go:65] Filtering out
> schedulables
> I0220 20:47:11.792953 1 scheduler_binder.go:829] All bound volumes for Pod
> "<podname>" match with Node <node>"
> I0220 20:47:11.792981 1 filter_out_schedulable.go:118] Pod <podname> marked
> as unschedulable can be scheduled on node <node> (based on hinting). Ignoring
> in scale up.
> {quote}
>
> # Can Yunikorn introduce retries for such scenarios?
> # Can pods be set to error state after retries?
> {{Note: pod name, nodename and ip masked above}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]