zhijiangW commented on issue #7186: [FLINK-10941] Keep slots which contain unconsumed result partitions URL: https://github.com/apache/flink/pull/7186#issuecomment-468555786 @tillrohrmann also mentioned this issue in discussion of proposed shuffle manager. The life cycle of `Task` and `ResultPartition` should be decoupled, and both of them would occupy slot resources. If `Task` finishes but `ResultPartition` is not consumed completely, then the slot resource should not be released which results in active `TaskExecutor`. Regarding with when to release `ResultPartition`, I think it might have three levels: First level: Up to `ResultPartition` itself like current way, triggered by finishing data transport on producer side. Second level: Up to consumer side, triggered by finish processing all the data on consumer side. It can avoid restarting the producer to re-produce data in some scenarios if consumer fails during processing. Third level: Up to `ShuffleMaster` side as proposed in `ShuffleManager`. `ShuffleMaster` is used for managing partitions globally. Even though the partition is consumed completely by downstream side, `ShuffleMaster` can still decide not to release it or delay release it for other concerns. So I think the third level has the mechanism to support all kinds of possibilities.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
