[
https://issues.apache.org/jira/browse/YUNIKORN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Bacsko updated YUNIKORN-1337:
-----------------------------------
Description:
The problem was introduced by YUNIKORN-1205.
When the placeholders are running and the entire job is deleted, the
application won't perform a state transition from Accepted to Completing. The
reason is that there are still placeholder allocations, so
{{removeAsksInternal()}} will not trigger it.
On the other hand, when the allocations are removed, the innermost {{if}}
branch will not be taken:
{noformat}
if alloc.placeholder {
sa.allocatedPlaceholder =
resources.Sub(sa.allocatedPlaceholder, alloc.AllocatedResource)
// if all the placeholders are replaced, clear the placeholder
timer
if resources.IsZero(sa.allocatedPlaceholder) {
sa.clearPlaceholderTimer()
if (sa.IsCompleting() && sa.stateTimer == nil) ||
sa.IsFailing() || sa.IsResuming() {
... // this will be skipped
}
{noformat}
We have to check if there is no allocation left and then trigger a
{{CompleteApplication}} event.
was:
The problem was introduced by YUNIKORN-1205.
When the placeholders are running and the entire job is deleted, the
application won't perform a state transition from Accepted to Completing. The
reason is that there are still placeholder allocations, so
{{removeAsksInternal()}} will not trigger it.
On the other hand, when the allocations are removed, this {{if}} branch will
not be taken:
{noformat}
if alloc.placeholder {
sa.allocatedPlaceholder =
resources.Sub(sa.allocatedPlaceholder, alloc.AllocatedResource)
// if all the placeholders are replaced, clear the placeholder
timer
if resources.IsZero(sa.allocatedPlaceholder) {
sa.clearPlaceholderTimer()
if (sa.IsCompleting() && sa.stateTimer == nil) ||
sa.IsFailing() || sa.IsResuming() {
... // this will be skipped
}
{noformat}
We have to check if there is no allocation left and then trigger a
{{CompleteApplication}} event.
> Application state stuck in "Accepted" when placeholders are running and the
> job is deleted
> ------------------------------------------------------------------------------------------
>
> Key: YUNIKORN-1337
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1337
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Major
>
> The problem was introduced by YUNIKORN-1205.
> When the placeholders are running and the entire job is deleted, the
> application won't perform a state transition from Accepted to Completing. The
> reason is that there are still placeholder allocations, so
> {{removeAsksInternal()}} will not trigger it.
> On the other hand, when the allocations are removed, the innermost {{if}}
> branch will not be taken:
> {noformat}
> if alloc.placeholder {
> sa.allocatedPlaceholder =
> resources.Sub(sa.allocatedPlaceholder, alloc.AllocatedResource)
> // if all the placeholders are replaced, clear the placeholder
> timer
> if resources.IsZero(sa.allocatedPlaceholder) {
> sa.clearPlaceholderTimer()
> if (sa.IsCompleting() && sa.stateTimer == nil) ||
> sa.IsFailing() || sa.IsResuming() {
> ... // this will be skipped
> }
> {noformat}
> We have to check if there is no allocation left and then trigger a
> {{CompleteApplication}} event.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]