[GitHub] [flink] zhijiangW commented on issue #8290: [FLINK-12070] [network] Implement new bounded blocking subpartitions

GitBox Wed, 08 May 2019 03:02:02 -0700

zhijiangW commented on issue #8290: [FLINK-12070] [network] Implement new 
bounded blocking subpartitions
URL: https://github.com/apache/flink/pull/8290#issuecomment-490427036
 
 
   Hey @StephanEwen , after thinking through your concerns of when and how to 
release readers, I tried to understand your proposed three options. In general 
I also prefer to the second option because of the following reasons:
   
   - When partition is released explicitly, that probably means the 
corresponding reader/view should not be consumable any more from the semantic 
aspect. So the first option to make existing readers still consumable is just 
limited by previous mechanism/implementation of releasing view.
   
   - `CancelPartition` is not always reliable by design ATM. It is only 
determined by internal TCP mechanism, and we have no other ack mechanisms in 
flink stack. Different from `BufferResponse` we add the sequence number to 
avoid the message lost in network.  In previous failover strategy, if the 
consumer fails and sends `CancelPartition` to the producer, even though this 
cancel message is not received by producer, the JM could also cancel the 
producer to avoid this issue.
   
   -  Even though we enhance to make `CancelPartition` reliable, we could not 
determine the reasonable time when the reader is disposed. Because 
`CancelPartition` would be sent when consumer fails or `EndOfPartitionEvent` is 
read from some `RemoteInputChannel`. There might exist the scenario to cause 
reader disposed very delay.
   
   - In third option we still want to release view via netty thread, then we 
set the flag to be seen by netty in the next read. This is very similar with 
the way when consumer netty is aware of any exception/errors, it would call 
`RemoteInputChannel#onError` to trigger task thread sensitive during 
`getNextBuffer`. But I am not sure whether we could confirm the next read 
happen or when to happen. In the case of `RemoteInputChannel#onError`, it must 
happen via following call `notifyChannelNonEmpty`. Also there seems a bit hacky 
in logic for this option as you mentioned .
   
   In summary I think the first option could still make sense in current PR for 
most of the cases. It is easy to not consider the specific references. Then we 
could further improve it future if necessary after finding any serious problems 
in practice.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [flink] zhijiangW commented on issue #8290: [FLINK-12070] [network] Implement new bounded blocking subpartitions

Reply via email to