azagrebin commented on issue #7186: [FLINK-10941] Keep slots which contain unconsumed result partitions URL: https://github.com/apache/flink/pull/7186#issuecomment-468950331 @zhijiangW thanks for explanation, `PartitionRequestQueue.channelInactive` and `handleException` should be sufficient to catch network problems on producer side. One more thing, as I understand, the original problem is that the tcp connection can be closed abruptly because of producer's task executor shutdown. I am wondering whether we have to change and delay the release of partition resources in producer (they, e.g. buffers, seem to be not needed after having flushed `EndOfPartitionEvent`) but rather introduce a separate `close()` method on reader/subpartition and `isClosed` flag in partition/subpartition and use this flag to drive executor shutdown in `JobMaster` instead of trying to reuse `isReleased`. `isClosed` would reflect the final state in lifecycle of network partition. What do you think or there is a good reason to delay the release of subpartition resources as well?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
