zhijiangW commented on a change in pull request #11534: [FLINK-16537][network]
Implement ResultPartition state recovery for unaligned checkpoint
URL: https://github.com/apache/flink/pull/11534#discussion_r400623114
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedSubpartition.java
##########
@@ -89,6 +91,25 @@
super(index, parent);
}
+ @Override
+ public void initializeState() throws IOException, InterruptedException {
+ ReadResult readResult = ReadResult.HAS_MORE_DATA;
+ while (readResult == ReadResult.HAS_MORE_DATA) {
+ BufferBuilder bufferBuilder =
parent.getBufferPool().requestBufferBuilderBlocking();
+ BufferConsumer bufferConsumer =
bufferBuilder.createBufferConsumer();
+ readResult =
parent.getChannelStateReader().readOutputData(subpartitionInfo, bufferBuilder);
+
+ // check whether there are some states data filled in
this time
+ bufferConsumer.update();
Review comment:
> I guess update() was added to read the value that was written in
ChannelStateReader?
Not really. The `update()` is used for checking whether there are any data
written in above `readOutputData` call. And it is actually used together with
below `bufferConsumer.getWrittenBytes()`. There are actually two implicit
limitations here:
- The `ReadResult` from `ChannelStateReader` only indicates the future
situation, not indicates whether the current call actually reads data or not,
especially for the first call. So we have to judge whether the passed
`BufferBuilder` has written any data or not.
- The cached position is only updated after calling `BufferConsumer#build()`
atm. In order to check the written position before constructing the slice
buffer, we have to call `update()` explicitly before calling
`bufferConsumer.getWrittenBytes()`. I ever tried to break this rule to also add
the `update()` inside `BufferConsumer#getWrittenBytes()` and
`BufferConsumer#isFinished()`. But it would bring many unit tests failure and
break previous design of `BufferConsumer` which might bring additional
discussions. So I introduce another separate `update()` from `BufferConsumer`
which can be used by demand.
> Wouldn't creation of bufferConsumer after readOutputData() have the same
effect?
Actually not. The property usage is to to create `BufferConsumer` firstly,
and then written data into `BufferBuilder`. Otherwise the delay creation of
`BufferConsumer` can not see the data written before creation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services