Wilfred Spiegelenburg created YUNIKORN-574:
----------------------------------------------
Summary: Wait for placeholder cleanup
Key: YUNIKORN-574
URL: https://issues.apache.org/jira/browse/YUNIKORN-574
Project: Apache YuniKorn
Issue Type: Bug
Components: core - scheduler
Reporter: Wilfred Spiegelenburg
When we cleanup the application in the {{timeoutPlaceholderProcessing()}} we
have two cases.
* First case we clean up all lingering placeholder allocations on the running
app
* Second case is the fail of the which cleans up lingering asks no response
needed from the shim) and all placeholders after which we fail the app.
The cleanup of the placeholders in both these cases are instigated by the core
and we need to wait for the cleanup to happen on the shim side before we
proceed. It is not like the remove of the app signalled by the RM. This comes
as an unexpected request for the shim not when the app is deleted on the shim
side.
For case 1 we do not have a problem. The placeholders are terminated and the
app runs as per normal and is not moved to Completed until all is finished.
We do NOT have an issue in the states leading to Completed as we have already
handled it there (see below)
For the failure case we immediately unlink the queue as we move into the FAILED
state. As the move calls the {{moveTerminatedApp()}} via the callback. That
causes an issue. We should be waiting for the shim to respond back to the core
with the confirmation of the removal.
This might require a new state to do this in two steps: trigger the cleanup
move to Failing state, when all is cleaned up move to Failed.
BTW: introducing a new state for Failing should also include the rename of
Waiting to Completing as that is inline with what the state does and lines up
between the two final states.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]