[ 
https://issues.apache.org/jira/browse/FLINK-26803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531603#comment-17531603
 ] 

fanrui commented on FLINK-26803:
--------------------------------

Hi [~pnowojski]  Thanks for your feedback.

I'm sorry. I didn't express my thoughts clearly. Let me add some information.

For this job, it's a 30% reduction. But most Flink jobs do not have very large 
state, and usually, their number of files is not very large. If the 
backpressure is severe and UC is enabled, the number of files for these jobs 
will become high.

{*}The idea I want to express is{*}: _For a flink job with a small state, high 
parallelism and a large number of jobs, if ChannelStateFile is optimized, the 
number of files may be reduced by more than 80%. And I believe production small 
state jobs are more._ 

 

When the backpressure is not severe, hdfs can run normally, if a large number 
of jobs have severe back pressure at the same time, it may bring several times 
the pressure to hdfs. The file number mechanism of UC makes it difficult for 
Flink SRE to estimate the hdfs capacity.

> Merge small ChannelState file for Unaligned Checkpoint
> ------------------------------------------------------
>
>                 Key: FLINK-26803
>                 URL: https://issues.apache.org/jira/browse/FLINK-26803
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing, Runtime / Network
>            Reporter: fanrui
>            Priority: Major
>
> When making an unaligned checkpoint, the number of ChannelState files is 
> TaskNumber * subtaskNumber. For high parallelism job, it writes too many 
> small files. It causes high load for hdfs NN.
>  
> In our production, a job writes more than 50K small files for each Unaligned 
> Checkpoint. Could we merge these files before write FileSystem? We can 
> configure the maximum number of files each TM can write in a single Unaligned 
> Checkpoint.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to