zhijiangW commented on a change in pull request #12478:
URL: https://github.com/apache/flink/pull/12478#discussion_r437906197
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriterImpl.java
##########
@@ -56,7 +56,7 @@
public class ChannelStateWriterImpl implements ChannelStateWriter {
private static final Logger LOG =
LoggerFactory.getLogger(ChannelStateWriterImpl.class);
- private static final int DEFAULT_MAX_CHECKPOINTS = 5; // currently,
only single in-flight checkpoint is supported
+ private static final int DEFAULT_MAX_CHECKPOINTS = 1000; // includes
max-concurrent-checkpoints + checkpoints to be aborted (scheduled via mailbox)
Review comment:
I guess the current way seems a temporary work-around solution, not an
elegant way. The initial purpose for introducing this threshold is for logic
validating and avoiding invalid checkpoints retained in writer forever. But if
we consider the abort delay into this threshold, it seems somehow lose the
initial meaning for the guard, and we are really not sure what is the proper
value for this threshold.
The proper way might resolve the potential race condition in essence, but it
might pay more efforts not feasible ATM. So I think we might leave another debt
here in future.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]