[jira] [Created] (YUNIKORN-2929) Skip allocation attempts for subsequent pods in an application if previous pods have failed to allocate

Mit Desai (Jira) Tue, 15 Oct 2024 16:58:32 -0700

Mit Desai created YUNIKORN-2929:
-----------------------------------

             Summary: Skip allocation attempts for subsequent pods in an 
application if previous pods have failed to allocate
                 Key: YUNIKORN-2929
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2929
             Project: Apache YuniKorn
          Issue Type: Task
            Reporter: Mit Desai
            Assignee: Mit Desai



When running Spark applications, if an executor pod fails to find a suitable 
node, it is likely that subsequent executor pods will also fail to find nodes. 
This is particularly problematic when the application has a toleration for a 
specific taint and there are limited nodes with that taint. The scheduler 
spends excessive time attempting to allocate pods, ultimately resulting in no 
pods being bound to nodes.

To optimize scheduling, we should:
 # Implement a check to determine if previous pods in the same application were 
successfully allocated.
 # Skip processing other pods in the application if previous pods failed to 
allocate.
 # Generalize this by:
 ** Adding an immediate action for Spark applications.
 ** Introducing a threshold ('n' number of pods) after which the scheduler will 
stop trying and restart the scheduling cycle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (YUNIKORN-2929) Skip allocation attempts for subsequent pods in an application if previous pods have failed to allocate

Reply via email to