[ 
https://issues.apache.org/jira/browse/FLINK-32070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087875#comment-18087875
 ] 

Rui Fan edited comment on FLINK-32070 at 6/10/26 8:25 AM:
----------------------------------------------------------

Hey [~zakelly] [~masteryhx] [~lijinzhong]  cc [~ym] 

while reading the discussion of [DISCUSS] FLIP-XXX: Independent Checkpoint 
Based On Pipeline Region 
[https://lists.apache.org/thread/qpztk0jdpcmhomszjx63l53xv26xnmwf] . I am 
thinking if Unified File Merging Mechanism is stable? and could 
execution.checkpointing.unaligned.max-subtasks-per-channel-state-file(FLINK-26803)
 be deprecated or removed?

During reading the code, I noticed a {{TODO}} in the 
[code|https://github.com/apache/flink/blob/f03c904426853ad3a62883d196b4f6b07c7ef365/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/filemerging/SubtaskFileMergingManagerRestoreOperation.java#L92]
 that does not seem to have any JIRA tracking it, and I'd like to confirm 
whether it is a known gap. In 
{{SubtaskFileMergingManagerRestoreOperation#restore()}} there is:
{code:java}
// TODO support channel state restore for unaligned checkpoint.{code}
This TODO was introduced by FLINK-32080. Meanwhile FLINK-32084 ("Migrate 
current file merging of channel state into the file merging framework") is 
already Closed/Resolved, but this restore-side gap is still left open in the 
code with no follow-up JIRA. Is there a ticket tracking it that I missed?

Following is a Claude analysis of what happens when {{file-merging.enabled = 
true}} for unaligned checkpoints:
{code:java}
- Channel state goes through file merging on the write path 
(ChannelStateCheckpointWriter → SegmentFileStateHandle), so its segments can 
share the same EXCLUSIVE physical file as keyed/operator state. 
- On restore, SubtaskFileMergingManagerRestoreOperation#restore() registers 
only keyed/operator handles and filters out channel state (the TODO), so the 
physical file's reference count is under-counted. 
- Reading still works initially, but once a later checkpoint discards the 
keyed/operator handles in that file, the ref count can drop to zero and the 
file gets deleted while channel state still references it — breaking a 
subsequent restore. {code}
My questions:
 # Is this a real bug? If yes, is there a JIRA tracking it — or should we open 
one?
 # If it's not a bug and channel state restore is actually stable, can 
FLINK-26803 
({{{}execution.checkpointing.unaligned.max-subtasks-per-channel-state-file{}}}) 
be deprecated/removed?
 # If there's no risk, should {{file-merging.enabled}} be turned on by default 
in a future release since it has been introduced for a couple of years?

Please correct me directly if the analysis is wrong. Thanks!


was (Author: fanrui):
Hey [~zakelly] [~masteryhx] [~lijinzhong]  cc [~ym] 

while reading the discussion of [DISCUSS] FLIP-XXX: Independent Checkpoint 
Based On Pipeline Region 
[https://lists.apache.org/thread/qpztk0jdpcmhomszjx63l53xv26xnmwf] . I am 
thinking if Unified File Merging Mechanism is stable? and could 
execution.checkpointing.unaligned.max-subtasks-per-channel-state-file(FLINK-26803)
 be deprecated or removed?

 

During reading the code, I noticed a {{TODO}} in the 
[code|https://github.com/apache/flink/blob/f03c904426853ad3a62883d196b4f6b07c7ef365/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/filemerging/SubtaskFileMergingManagerRestoreOperation.java#L92]
 that does not seem to have any JIRA tracking it, and I'd like to confirm 
whether it is a known gap. In 
{{SubtaskFileMergingManagerRestoreOperation#restore()}} there is:

{{}}
{code:java}

{code}
{{// TODO support channel state restore for unaligned checkpoint.}}

 

This TODO was introduced by FLINK-32080. Meanwhile FLINK-32084 ("Migrate 
current file merging of channel state into the file merging framework") is 
already Closed/Resolved, but this restore-side gap is still left open in the 
code with no follow-up JIRA. Is there a ticket tracking it that I missed?

Following is a Claude analysis of what happens when {{file-merging.enabled = 
true}} for unaligned checkpoints:

 
{code:java}
- Channel state goes through file merging on the write path 
(ChannelStateCheckpointWriter → SegmentFileStateHandle), so its segments can 
share the same EXCLUSIVE physical file as keyed/operator state. 
- On restore, SubtaskFileMergingManagerRestoreOperation#restore() registers 
only keyed/operator handles and filters out channel state (the TODO), so the 
physical file's reference count is under-counted. 
- Reading still works initially, but once a later checkpoint discards the 
keyed/operator handles in that file, the ref count can drop to zero and the 
file gets deleted while channel state still references it — breaking a 
subsequent restore. {code}
 

My questions:
 # Is this a real bug? If yes, is there a JIRA tracking it — or should we open 
one?
 # If it's not a bug and channel state restore is actually stable, can 
FLINK-26803 
({{{}execution.checkpointing.unaligned.max-subtasks-per-channel-state-file{}}}) 
be deprecated/removed?
 # If there's no risk, should {{file-merging.enabled}} be turned on by default 
in a future release since it has been introduced for a couple of years?

Please correct me directly if the analysis is wrong. Thanks!

> FLIP-306 Unified File Merging Mechanism for Checkpoints
> -------------------------------------------------------
>
>                 Key: FLINK-32070
>                 URL: https://issues.apache.org/jira/browse/FLINK-32070
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Checkpointing, Runtime / State Backends
>            Reporter: Zakelly Lan
>            Assignee: Zakelly Lan
>            Priority: Major
>             Fix For: 2.4.0
>
>
> The FLIP: 
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints]
>  
> The creation of multiple checkpoint files can lead to a 'file flood' problem, 
> in which a large number of files are written to the checkpoint storage in a 
> short amount of time. This can cause issues in large clusters with high 
> workloads, such as the creation and deletion of many files increasing the 
> amount of file meta modification on DFS, leading to single-machine hotspot 
> issues for meta maintainers (e.g. NameNode in HDFS). Additionally, the 
> performance of object storage (e.g. Amazon S3 and Alibaba OSS) can 
> significantly decrease when listing objects, which is necessary for object 
> name de-duplication before creating an object, further affecting the 
> performance of directory manipulation in the file system's perspective of 
> view (See [hadoop-aws module 
> documentation|https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#:~:text=an%20intermediate%20state.-,Warning%20%232%3A%20Directories%20are%20mimicked,-The%20S3A%20clients],
>  section 'Warning #2: Directories are mimicked').
> While many solutions have been proposed for individual types of state files 
> (e.g. FLINK-11937 for keyed state (RocksDB) and FLINK-26803 for channel 
> state), the file flood problems from each type of checkpoint file are similar 
> and lack systematic view and solution. Therefore, the goal of this FLIP is to 
> establish a unified file merging mechanism to address the file flood problem 
> during checkpoint creation for all types of state files, including keyed, 
> non-keyed, channel, and changelog state. This will significantly improve the 
> system stability and availability of fault tolerance in Flink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to