zentol opened a new pull request #14951: URL: https://github.com/apache/flink/pull/14951
Hides the fact that the ExecutionGraph can reach a globally-terminal state while the DeclarativeScheduler is in a Restarting/Canceling/Failing/Executing state. This can happen because the transition into a globally-terminal state in the EG and the scheduler transition to WaitingForResources/Finished does not happen atomically (mainthread-wise). This should not happen during Restarting because it would break the contract that a globally-terminal job never transitions into another state. As for Restarting/Canceling/Failing/Executing, this is mostly for consistency; we should ensure that the scheduler has a chance to cleanup whatever it wants before we communicate to the outside that the job is done. In short, `ArchivedExecutionGraph#createFrom` now accepts an optional non-terminal JobStatus that overrides the state of the ArchivedExecutionGraph. Additionally, if set, the timestamps for all globally-terminal state transitions are removed (or rather not copied). All `StateWithExecutionGraph` classes now also pin the job state within `getJobStatus()`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
