zhijiangW commented on issue #7186: [FLINK-10941] Keep slots which contain 
unconsumed result partitions
URL: https://github.com/apache/flink/pull/7186#issuecomment-468555786
 
 
   @tillrohrmann also mentioned this issue in discussion of proposed shuffle 
manager.
   The life cycle of `Task` and `ResultPartition` should be decoupled, and both 
of them would occupy slot resources. If `Task` finishes but `ResultPartition` 
is not consumed completely, then the slot resource should not be released which 
results in active `TaskExecutor`.
   
   Regarding with when to release `ResultPartition`, I think it might have 
three levels:
   
   First level: Up to `ResultPartition` itself like current way, triggered by 
finishing data transport on producer side.
   
   Second level: Up to consumer side, triggered by finish processing all the 
data on consumer side. It can avoid restarting the producer to re-produce data 
in some scenarios if consumer fails during processing.
   
   Third level: Up to `ShuffleMaster` side as proposed in `ShuffleManager`. 
`ShuffleMaster` is used for managing partitions globally. Even though the 
partition is consumed completely by downstream side, `ShuffleMaster` can still 
decide not to release it or delay release it for other concerns.
   
   So I think the third level has the mechanism to support all kinds of 
possibilities.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to