gaoyunhaii commented on a change in pull request #8778: 
[FLINK-12615][coordination] Track partitions on JM
URL: https://github.com/apache/flink/pull/8778#discussion_r297030031
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java
 ##########
 @@ -876,6 +887,17 @@ private void jobStatusChanged(
                validateRunsInMainThread();
 
                if (newJobStatus.isGloballyTerminalState()) {
+                       // other terminal job states are handled by the 
executions
+                       if (newJobStatus == JobStatus.FINISHED) {
 
 Review comment:
   From my understanding, I think we can only consider FINISHED due to that 
when transit to CANCELED and FAILED , all execution vertices will be canceled 
and the thus the result partition will be canceled via 
`sendReleaseIntermediateResultPartitionsRpcCall`. 
   
   However, with region failover, an execution may need to re-run after it 
turns into FINISHED. This happens for cases like `A -> C, B->C`, A and B are 
all finished but C fails to read data from A, then B also need to re-execute. 
When the execution vertex is `resetForNewExecution`, the result partition of 
previous execution is not released. Then when we `cancel` or `failGlobal` on 
the ExecutionGraph, it will also only consider the last execution, then the 
result partitions of the previous executions may not get released explicitly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to