[ 
https://issues.apache.org/jira/browse/FLINK-26803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531182#comment-17531182
 ] 

fanrui commented on FLINK-26803:
--------------------------------

Hi [~pnowojski] you're right.

After my research, 15K of the 50K files are Unaligned Checkpoint, and the 
remaining 35K are from RocksDB. So for my this job, I think FLINK-11937 is 
useful for us.

Some of our important jobs hope that when the back pressure is severe, the CP 
can be successful. After research, we found that UC can solve this problem. 
However, due to the small file problem, we dare not use it in a large number of 
jobs. If UC (Unaligned Checkpoint) is enabled for all flink jobs, when external 
components such as Kafka slow down, a large number of flink jobs write Kafka 
slow. It will lead to severe back pressure, and a large number of jobs may 
write a large number of channel state files at this time. It may cause hdfs NN 
avalanche.

If UC is enabled for a small number of jobs, and they have low parallelism and 
a small number of tasks, the number of files doesn't matter. If UC is used for 
mass production, solving the small file problem is mostly about improving the 
availability and stability of the job. 

For code implementation, merging files should be easy, it's not like RocksDB 
Incremental Checkpoint has dependencies that need to be handled. And there 
should be no side effects for multiple tasks within a single TM to write the 
same hdfs file. Unless the ChannelState written by a single TM is very large, 
it can take too long for a single thread to write a single file. We can decide 
how many files to write based on the file size.

Please correct me if i'm wrong. Thanks a lot.

> Merge small ChannelState file for Unaligned Checkpoint
> ------------------------------------------------------
>
>                 Key: FLINK-26803
>                 URL: https://issues.apache.org/jira/browse/FLINK-26803
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing, Runtime / Network
>            Reporter: fanrui
>            Priority: Major
>
> When making an unaligned checkpoint, the number of ChannelState files is 
> TaskNumber * subtaskNumber. For high parallelism job, it writes too many 
> small files. It causes high load for hdfs NN.
>  
> In our production, a job writes more than 50K small files for each Unaligned 
> Checkpoint. Could we merge these files before write FileSystem? We can 
> configure the maximum number of files each TM can write in a single Unaligned 
> Checkpoint.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to