XComp commented on a change in pull request #18910:
URL: https://github.com/apache/flink/pull/18910#discussion_r815822695



##########
File path: docs/content/docs/internals/job_scheduling.md
##########
@@ -95,3 +95,22 @@ For that reason, the execution of an ExecutionVertex is 
tracked in an {{< gh_lin
 {{< img src="/fig/state_machine.svg" alt="States and Transitions of Task 
Executions" width="50%" >}}
 
 {{< top >}}
+
+## Repeatable Resource Cleanup Strategy
+
+Once a job has reached a terminally global state of either finished, failed or 
cancelled, the
+resources associated with the job are then cleaned up. This is done via a 
repeatable
+resource cleanup strategy, in which failures to clean up a resource result 
retries separated by
+an exponentially increasing delay, within configured bounds.
+
+{{< img src="/fig/repeatable_cleanup.png" alt="Repeatable resource cleanup 
across a failover event" >}}
+
+Determining which jobs are globally terminated but still need to be cleaned up 
across
+a failover event is done by determining whether an entry for the job exists in 
the JobResultStore
+and whether that entry is dirty (and thus needs resource cleanup) or clean 
(and thus does not
+need any further cleanup).
+
+The repeatable resource cleanup strategy has sensible defaults for the minimum 
and maximum

Review comment:
       After PR #18913 has been approved, you can use the information from the 
branch to update the documentation, I guess.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to