azagrebin commented on issue #7186: [FLINK-10941] Keep slots which contain 
unconsumed result partitions
URL: https://github.com/apache/flink/pull/7186#issuecomment-468950331
 
 
   @zhijiangW 
   thanks for explanation, `PartitionRequestQueue.channelInactive` and 
`handleException` should be sufficient to catch network problems on producer 
side.
   
   One more thing, as I understand, the original problem is that the tcp 
connection can be closed abruptly  because of producer's task executor 
shutdown. I am wondering whether we have to change and delay the release of 
partition resources in producer (they, e.g. buffers, seem to be not needed 
after having flushed `EndOfPartitionEvent`) but rather introduce a separate 
`close()` method on reader/subpartition and `isClosed` flag in 
partition/subpartition and use this flag to drive executor shutdown in 
`JobMaster` instead of trying to reuse `isReleased`. `isClosed` would reflect 
the final state in lifecycle of network partition.
   What do you think or there is a good reason to delay the release of 
subpartition resources as well?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to