[
https://issues.apache.org/jira/browse/FLINK-22805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354188#comment-17354188
]
Jiayi Liao commented on FLINK-22805:
------------------------------------
This is a good point. But I think the root problem is that, the periodic
scheduler for checkpoint in {{CheckpointCoordinator}} is too simple to satisfy
different scenarios. There're several scenarios we've met that the periodic
scheduler cannot satisfy:
* Transfer data from Kafka to Hive's partition table, user usually wants the
checkpoint happens as soon as possible when a Hive's partition is finished.
* Different interval and timeout for different traffic. From user's
perspective, what they care about is how much data they need to backtrack if
the job fails, which means shorter interval on heavy traffic and longer
interval on light traffic.
We abstract a {{CheckpointScheduler}} in {{CheckpointCoordinator}} at
Bytedance, to be responsible for the scheduling of the checkpoint, which can
also be extended by users.
> Dynamic configuration of Flink checkpoint interval
> --------------------------------------------------
>
> Key: FLINK-22805
> URL: https://issues.apache.org/jira/browse/FLINK-22805
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing
> Affects Versions: 1.13.1
> Reporter: Fu Kai
> Priority: Critical
> Fix For: 1.14.0
>
>
> Flink currently does not support dynamic configuration of checkpoint interval
> on the fly. It's useful for use cases like backfill/cold-start from a stream
> containing whole history.
>
> In the cold-start phase, resources are fully utilized and the back-pressure
> is high for all upstream operators, causing the checkpoint timeout
> constantly. The real production traffic is far less than that and the
> provisioned resource is capable of handling it.
>
> With the dynamic checkpoint interval configuration, the cold-start process
> can be speeded up with less frequent checkpoint interval or even turned off.
> After the process is completed, the checkpoint interval can be updated to
> normal.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)