zhijiangW commented on a change in pull request #11351: [FLINK-16404][runtime] Solve the potential deadlock problem when reducing exclusive buffers to zero URL: https://github.com/apache/flink/pull/11351#discussion_r397618266
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/InputChannel.java ########## @@ -56,6 +57,10 @@ protected final SingleInputGate inputGate; + protected long currentCheckpointId = -1; + + protected ChannelState channelState = ChannelState.CONSUMING; Review comment: For the case of `LocalInputChannel`, the current problem is that when the subparition view notifies the data available, then the respective local channel would be added into gate data queue. When the gate pools buffer from such local channel, it should be aware whether this local channel was already blocked by `CheckpointBarrierAligner` or not. If blocked, it should not return this buffer to upper component to avoid caching. Another option to solve this issue is to avoid adding local channel into gate queue via tagging the blocked state in `ResultSubpartitionView`. We already added this state in `NetworkSequenceViewReader` for remote channel. If we can migrate this state into `ResultSubpartitionView` level, then we can make reuse of this state for both remote and local channels. And I think it should be transparent to do the similar things either in `NetworkSequenceViewReader` or `ResultSubpartitionView`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services