[
https://issues.apache.org/jira/browse/FLINK-32070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Knauf updated FLINK-32070:
-------------------------------------
Fix Version/s: (was: 1.18.0)
> FLIP-306 Unified File Merging Mechanism for Checkpoints
> -------------------------------------------------------
>
> Key: FLINK-32070
> URL: https://issues.apache.org/jira/browse/FLINK-32070
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing, Runtime / State Backends
> Affects Versions: 1.18.0
> Reporter: Zakelly Lan
> Assignee: Zakelly Lan
> Priority: Major
>
> The FLIP:
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints]
>
> The creation of multiple checkpoint files can lead to a 'file flood' problem,
> in which a large number of files are written to the checkpoint storage in a
> short amount of time. This can cause issues in large clusters with high
> workloads, such as the creation and deletion of many files increasing the
> amount of file meta modification on DFS, leading to single-machine hotspot
> issues for meta maintainers (e.g. NameNode in HDFS). Additionally, the
> performance of object storage (e.g. Amazon S3 and Alibaba OSS) can
> significantly decrease when listing objects, which is necessary for object
> name de-duplication before creating an object, further affecting the
> performance of directory manipulation in the file system's perspective of
> view (See [hadoop-aws module
> documentation|https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#:~:text=an%20intermediate%20state.-,Warning%20%232%3A%20Directories%20are%20mimicked,-The%20S3A%20clients],
> section 'Warning #2: Directories are mimicked').
> While many solutions have been proposed for individual types of state files
> (e.g. FLINK-11937 for keyed state (RocksDB) and FLINK-26803 for channel
> state), the file flood problems from each type of checkpoint file are similar
> and lack systematic view and solution. Therefore, the goal of this FLIP is to
> establish a unified file merging mechanism to address the file flood problem
> during checkpoint creation for all types of state files, including keyed,
> non-keyed, channel, and changelog state. This will significantly improve the
> system stability and availability of fault tolerance in Flink.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)