[
https://issues.apache.org/jira/browse/FLINK-27101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606485#comment-17606485
]
Jiale Tan edited comment on FLINK-27101 at 9/22/22 3:11 AM:
------------------------------------------------------------
Hi folks,
I got [this|https://github.com/apache/flink/pull/20852] draft PR for option 3
as discussed above:
??Expose triggering checkpoint via CLI and/or REST API with some parameters to
choose incremental/full checkpoint.??
The API and implementation is very similar to save point trigger.
I am new to contributing to flink, please let me know if I am in the right
direction. If yes, may start a small FLIP / dev mailing list discussion
was (Author: JIRAUSER290356):
Hi folks,
I got [this|https://github.com/apache/flink/pull/20852] draft PR for option 3
as discussed above:
??Expose triggering checkpoint via CLI and/or REST API with some parameters to
choose incremental/full checkpoint.??
The API and implementation is very similar to save point trigger.
I am new to contributing to flink, please let me know if I am in the right
direction. If yes, will start a small FLIP / dev mailing list discussion
> Periodically break the chain of incremental checkpoint (trigger checkpoints
> via REST API)
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-27101
> URL: https://issues.apache.org/jira/browse/FLINK-27101
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing, Runtime / REST
> Reporter: Steven Zhen Wu
> Assignee: Jiale Tan
> Priority: Major
> Labels: pull-request-available
>
> Incremental checkpoint is almost a must for large-state jobs. It greatly
> reduces the bytes uploaded to DFS per checkpoint. However, there are a few
> implications from incremental checkpoint that are problematic for production
> operations. Will use S3 as an example DFS for the rest of description.
> 1. Because there is no way to deterministically know how far back the
> incremental checkpoint can refer to files uploaded to S3, it is very
> difficult to set S3 bucket/object TTL. In one application, we have observed
> Flink checkpoint referring to files uploaded over 6 months ago. S3 TTL can
> corrupt the Flink checkpoints.
> S3 TTL is important for a few reasons
> - purge orphaned files (like external checkpoints from previous deployments)
> to keep the storage cost in check. This problem can be addressed by
> implementing proper garbage collection (similar to JVM) by traversing the
> retained checkpoints from all jobs and traverse the file references. But that
> is an expensive solution from engineering cost perspective.
> - Security and privacy. E.g., there may be requirement that Flink state can't
> keep the data for more than some duration threshold (hours/days/weeks).
> Application is expected to purge keys to satisfy the requirement. However,
> with incremental checkpoint and how deletion works in RocksDB, it is hard to
> set S3 TTL to purge S3 files. Even though those old S3 files don't contain
> live keys, they may still be referrenced by retained Flink checkpoints.
> 2. Occasionally, corrupted checkpoint files (on S3) are observed. As a
> result, restoring from checkpoint failed. With incremental checkpoint, it
> usually doesn't help to try other older checkpoints, because they may refer
> to the same corrupted file. It is unclear whether the corruption happened
> before or during S3 upload. This risk can be mitigated with periodical
> savepoints.
> It all boils down to periodical full snapshot (checkpoint or savepoint) to
> deterministically break the chain of incremental checkpoints. Search the jira
> history, the behavior that FLINK-23949 [1] trying to fix is actually close to
> what we would need here.
> There are a few options
> 1. Periodically trigger savepoints (via control plane). This is actually not
> a bad practice and might be appealing to some people. The problem is that it
> requires a job deployment to break the chain of incremental checkpoint.
> periodical job deployment may sound hacky. If we make the behavior of full
> checkpoint after a savepoint (fixed in FLINK-23949) configurable, it might be
> an acceptable compromise. The benefit is that no job deployment is required
> after savepoints.
> 2. Build the feature in Flink incremental checkpoint. Periodically (with some
> cron style config) trigger a full checkpoint to break the incremental chain.
> If the full checkpoint failed (due to whatever reason), the following
> checkpoints should attempt full checkpoint as well until one successful full
> checkpoint is completed.
> 3. For the security/privacy requirement, the main thing is to apply
> compaction on the deleted keys. That could probably avoid references to the
> old files. Is there any RocksDB compation can achieve full compaction of
> removing old delete markers. Recent delete markers are fine
> [1] https://issues.apache.org/jira/browse/FLINK-23949
--
This message was sent by Atlassian Jira
(v8.20.10#820010)