zhijiangW commented on issue #8290: [FLINK-12070] [network] Implement new bounded blocking subpartitions URL: https://github.com/apache/flink/pull/8290#issuecomment-490427036 Hey @StephanEwen , after thinking through your concerns of when and how to release readers, I tried to understand your proposed three options. In general I also prefer to the second option because of the following reasons: - When partition is released explicitly, that probably means the corresponding reader/view should not be consumable any more from the semantic aspect. So the first option to make existing readers still consumable is just limited by previous mechanism/implementation of releasing view. - `CancelPartition` is not always reliable by design ATM. It is only determined by internal TCP mechanism, and we have no other ack mechanisms in flink stack. Different from `BufferResponse` we add the sequence number to avoid the message lost in network. In previous failover strategy, if the consumer fails and sends `CancelPartition` to the producer, even though this cancel message is not received by producer, the JM could also cancel the producer to avoid this issue. - Even though we enhance to make `CancelPartition` reliable, we could not determine the reasonable time when the reader is disposed. Because `CancelPartition` would be sent when consumer fails or `EndOfPartitionEvent` is read from some `RemoteInputChannel`. There might exist the scenario to cause reader disposed very delay. - In third option we still want to release view via netty thread, then we set the flag to be seen by netty in the next read. This is very similar with the way when consumer netty is aware of any exception/errors, it would call `RemoteInputChannel#onError` to trigger task thread sensitive during `getNextBuffer`. But I am not sure whether we could confirm the next read happen or when to happen. In the case of `RemoteInputChannel#onError`, it must happen via following call `notifyChannelNonEmpty`. Also there seems a bit hacky in logic for this option as you mentioned . In summary I think the first option could still make sense in current PR for most of the cases. It is easy to not consider the specific references. Then we could further improve it future if necessary after finding any serious problems in practice.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
