zhijiangW commented on a change in pull request #11687:
[FLINK-16536][network][checkpointing] Implement InputChannel state recovery for
unaligned checkpoint
URL: https://github.com/apache/flink/pull/11687#discussion_r407222896
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
##########
@@ -149,16 +154,58 @@ void assignExclusiveSegments() throws IOException {
}
}
+ /**
+ * Reads the channel state data executed by netty thread, so it can
make use of almost all the
+ * existing processes to avoid bringing additional race conditions with
task thread. Also it can
+ * avoid introducing another thread pool to handle this work to make
things more complex.
+ */
+ private void readInputChannelState() throws IOException {
+ while (true) {
+ Buffer buffer;
+ synchronized (bufferQueue) {
+ buffer = bufferQueue.takeBuffer();
+ if (buffer == null) {
+ if (isReleased()) {
+ return;
+ }
+
+ buffer =
inputGate.getBufferPool().requestBuffer();
+ if (buffer != null) {
+
bufferQueue.addFloatingBuffer(buffer);
+ continue;
+ } else {
+
inputGate.getBufferProvider().addBufferListener(this);
+ isWaitingForStateBuffers = true;
+ return;
+ }
+ }
+ }
+
+ ChannelStateReader.ReadResult result =
inputGate.stateReader.readInputData(channelInfo, buffer);
Review comment:
Yes, `readInputChannelState` can be executed concurrently by multiple netty
threads for different channels by design. I think in general task processing
should be more faster than reading states, so one thread might not be enough
for filling buffer to feed task thread well. And every channel actually has
exclusive buffers which can be used in parallel to speed up recovery process.
I overlooked the `NotThreadSafe` annotation in `ChanelStateReaderImpl`.
Since every input channel handle will actually generate a separate
`ChannelStateStreamReader` and respective stream, I was supposed one input
channel state should not be read by multiple threads, but different channel
states can be read by different threads concurrent. I would further confirm
with Roman whether there are other limitations.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services