[ 
https://issues.apache.org/jira/browse/YUNIKORN-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417287#comment-17417287
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-854:
------------------------------------------------

All three cases should now be handled.

The only remark with this change is that the placeholder is not re-instated and 
will not be used for the retry of the real workload of the application. Since 
the real pod has not been assigned to any node the real pod will be 
re-scheduled as per normal behaviour.

> node removal with inflight placeholder replacement failure
> ----------------------------------------------------------
>
>                 Key: YUNIKORN-854
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-854
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>    Affects Versions: 0.10
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Critical
>              Labels: pull-request-available
>
> If a node gets removed while a replacement of a placeholder pod is inflight 
> we lose track:
>  * placeholder and real allocation on the same node causes the real pod to 
> never be allocated as the ask repeat has been decremented but the allocation 
> is removed being processed
>  * placeholder and real allocation on different nodes causes the real 
> allocation to beĀ  linked to a node but never bound on K8s if the placeholder 
> node is removed.
>  * placeholder and real allocation on different nodes causes a zombie 
> allocation on the application if the real allocation node is removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to