[jira] [Updated] (YUNIKORN-2929) Implement Skip Allocation Check for Unsuccessful Pods

Craig Condit (Jira) Wed, 19 Feb 2025 14:40:31 -0800


     [ 
https://issues.apache.org/jira/browse/YUNIKORN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Craig Condit updated YUNIKORN-2929:
-----------------------------------
    Target Version:   (was: 1.7.0)

>  Implement Skip Allocation Check for Unsuccessful Pods
> ------------------------------------------------------
>
>                 Key: YUNIKORN-2929
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2929
>             Project: Apache YuniKorn
>          Issue Type: Task
>          Components: core - scheduler
>            Reporter: Mit Desai
>            Assignee: Mit Desai
>            Priority: Major
>
> Skip allocation attempts for subsequent pods in an application if previous 
> pods have failed to allocate.
> When running Spark applications, if an executor pod fails to find a suitable 
> node, it is likely that subsequent executor pods will also fail to find 
> nodes. This is particularly problematic when the application has a toleration 
> for a specific taint and there are limited nodes with that taint. The 
> scheduler spends excessive time attempting to allocate pods, ultimately 
> resulting in no pods being bound to nodes.
> To optimize scheduling, we should:
>  # Implement a check to determine if previous pods in the same application 
> were successfully allocated.
>  # Skip processing other pods in the application if previous pods failed to 
> allocate.
>  # Generalize this by:
>  ** Adding an immediate action for Spark applications.
>  ** Introducing a threshold ('n' number of pods) after which the scheduler 
> will stop trying and restart the scheduling cycle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YUNIKORN-2929) Implement Skip Allocation Check for Unsuccessful Pods

Reply via email to