[jira] [Commented] (FLINK-32070) FLIP-306 Unified File Merging Mechanism for Checkpoints

xiaogang zhou (Jira) Sun, 07 Apr 2024 02:01:23 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-32070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834636#comment-17834636
 ]


xiaogang zhou commented on FLINK-32070:
---------------------------------------

[~Zakelly] yes, sounds good, Let me take a look 

> FLIP-306 Unified File Merging Mechanism for Checkpoints
> -------------------------------------------------------
>
>                 Key: FLINK-32070
>                 URL: https://issues.apache.org/jira/browse/FLINK-32070
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Checkpointing, Runtime / State Backends
>            Reporter: Zakelly Lan
>            Assignee: Zakelly Lan
>            Priority: Major
>             Fix For: 1.20.0
>
>
> The FLIP: 
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints]
>  
> The creation of multiple checkpoint files can lead to a 'file flood' problem, 
> in which a large number of files are written to the checkpoint storage in a 
> short amount of time. This can cause issues in large clusters with high 
> workloads, such as the creation and deletion of many files increasing the 
> amount of file meta modification on DFS, leading to single-machine hotspot 
> issues for meta maintainers (e.g. NameNode in HDFS). Additionally, the 
> performance of object storage (e.g. Amazon S3 and Alibaba OSS) can 
> significantly decrease when listing objects, which is necessary for object 
> name de-duplication before creating an object, further affecting the 
> performance of directory manipulation in the file system's perspective of 
> view (See [hadoop-aws module 
> documentation|https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#:~:text=an%20intermediate%20state.-,Warning%20%232%3A%20Directories%20are%20mimicked,-The%20S3A%20clients],
>  section 'Warning #2: Directories are mimicked').
> While many solutions have been proposed for individual types of state files 
> (e.g. FLINK-11937 for keyed state (RocksDB) and FLINK-26803 for channel 
> state), the file flood problems from each type of checkpoint file are similar 
> and lack systematic view and solution. Therefore, the goal of this FLIP is to 
> establish a unified file merging mechanism to address the file flood problem 
> during checkpoint creation for all types of state files, including keyed, 
> non-keyed, channel, and changelog state. This will significantly improve the 
> system stability and availability of fault tolerance in Flink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32070) FLIP-306 Unified File Merging Mechanism for Checkpoints

Reply via email to