gaoyunhaii commented on a change in pull request #8778:
[FLINK-12615][coordination] Track partitions on JM
URL: https://github.com/apache/flink/pull/8778#discussion_r297030031
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java
##########
@@ -876,6 +887,17 @@ private void jobStatusChanged(
validateRunsInMainThread();
if (newJobStatus.isGloballyTerminalState()) {
+ // other terminal job states are handled by the
executions
+ if (newJobStatus == JobStatus.FINISHED) {
Review comment:
From my understanding, I think we can only consider FINISHED due to that
when transit to CANCELED and FAILED , all execution vertices will be canceled
and the thus the result partition will be canceled via
`sendReleaseIntermediateResultPartitionsRpcCall`.
However, with region failover, an execution may need to re-run after it
turns into FINISHED. This happens for cases like `A -> C, B->C`, A and B are
all finished but C fails to read data from A, then B also need to re-execute if
the partition is random (like RebalancePartition). When the execution vertex is
`resetForNewExecution`, the result partition of previous execution is not
released. Then when we `cancel` or `failGlobal` on the ExecutionGraph, it will
also only consider the last execution, then some result partitions may be
leaked.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services