1996fanrui commented on code in PR #20233:
URL: https://github.com/apache/flink/pull/20233#discussion_r923351868
##########
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriterImpl.java:
##########
@@ -60,89 +63,78 @@
public class ChannelStateWriterImpl implements ChannelStateWriter {
private static final Logger LOG =
LoggerFactory.getLogger(ChannelStateWriterImpl.class);
- private static final int DEFAULT_MAX_CHECKPOINTS =
- 1000; // includes max-concurrent-checkpoints + checkpoints to be
aborted (scheduled via
- // mailbox)
Review Comment:
Hi @pnowojski , thanks for your review.
The `checkpointId < nextExpectedCheckpointId` may happen. For example,
`AlternatingWaitingForFirstBarrierUnaligned#barrierReceived` don't check
whether checkpointId is aborted, if received a barrier, it will call the
ChannelStateWriter.start().
- If `checkpointId < ongoingCheckpointId`, it should throw exception.
- If `checkpointId >= ongoingCheckpointId && checkpointId <
nextExpectedCheckpointId` that is `checkpointId >= ongoingCheckpointId &&
abortedCheckpointIds.contains(checkpointId)`, it should ignore the checkpoint.
- If `checkpointId >= ongoingCheckpointId &&
!abortedCheckpointIds.contains(checkpointId)`, the ongoingCheckpointId should
be updated, and start this checkpoint.
`checkpointId >= ongoingCheckpointId && checkpointId <
nextExpectedCheckpointId` and `checkpointId < ongoingCheckpointId` are
processed differently. So, we should use `ongoingCheckpointId`, and just update
it within `ChannelStateWriterImpl#start`
We should ignore checkpoint when the checkpointId is aborted, so we need
the `NavigableSet<Long> abortedCheckpointIds;`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]