gaoyunhaii commented on a change in pull request #8778: 
[FLINK-12615][coordination] Track partitions on JM
URL: https://github.com/apache/flink/pull/8778#discussion_r297030031
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java
 ##########
 @@ -876,6 +887,17 @@ private void jobStatusChanged(
                validateRunsInMainThread();
 
                if (newJobStatus.isGloballyTerminalState()) {
+                       // other terminal job states are handled by the 
executions
+                       if (newJobStatus == JobStatus.FINISHED) {
 
 Review comment:
   From my understanding, I think we can only consider FINISHED due to that 
when transit to CANCELED and FAILED , all execution vertices will be canceled 
and the thus the result partition will be canceled via 
`sendReleaseIntermediateResultPartitionsRpcCall`. 
   
   However, with region failover, an execution may need to re-run after it 
turns into FINISHED. This happens for cases like `A -> C, B->C`, A and B are 
all finished but C fails to read data from A, then B also need to re-execute if 
the partition is random (like RebalancePartition). When the execution vertex is 
`resetForNewExecution`, the result partition of previous execution is not 
released. Then when we `cancel` or `failGlobal` on the ExecutionGraph, it will 
also only consider the last execution, then the result partitions of the 
previous executions may not get released explicitly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to