[ 
https://issues.apache.org/jira/browse/FLINK-11937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917498#comment-16917498
 ] 

He Xiaoqiao commented on FLINK-11937:
-------------------------------------

[~klion26] Thanks for your proposal and contributions, we meet the same issue 
in our production env, and I just wonder if any plan to push this feature 
forward? cc [~carp84]
I would like to offer some more informations when meet 'small files flood' 
about HDFS,
1. It brings pressure for NameNode since huge heap footprint overhead, and 
performance impaction due to high write ops.
2. DataNode throughput will be also impacted since frequently GC. In my 
experience, I meet over 6 million blocks located at per DataNode, however, the 
average size of blocks is less than 1KB (originate from small files). The 
result is that we need config more and more jvm heap size to manage the block 
object reference, otherwise, GC will occur frequently. Although we could get 
around this issue, it is not the graceful solution.
We look forward to this JIRA step forward. Thanks again.

> Resolve small file problem in RocksDB incremental checkpoint
> ------------------------------------------------------------
>
>                 Key: FLINK-11937
>                 URL: https://issues.apache.org/jira/browse/FLINK-11937
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Checkpointing
>            Reporter: Congxian Qiu(klion26)
>            Assignee: Congxian Qiu(klion26)
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently when incremental checkpoint is enabled in RocksDBStateBackend a 
> separate file will be generated on DFS for each sst file. This may cause 
> “file flood” when running intensive workload (many jobs with high 
> parallelism) in big cluster. According to our observation in Alibaba 
> production, such file flood introduces at lease two drawbacks when using HDFS 
> as the checkpoint storage FileSystem: 1) huge number of RPC request issued to 
> NN which may burst its response queue; 2) huge number of files causes big 
> pressure on NN’s on-heap memory.
> In Flink we ever noticed similar small file flood problem and tried to 
> resolved it by introducing ByteStreamStateHandle(FLINK-2808), but this 
> solution has its limitation that if we configure the threshold too low there 
> will still be too many small files, while if too high the JM will finally 
> OOM, thus could hardly resolve the issue in case of using RocksDBStateBackend 
> with incremental snapshot strategy.
> We propose a new OutputStream called 
> FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS 
> will reuse the same underlying distributed file until its size exceeds a 
> preset threshold. We
> plan to complete the work in 3 steps: firstly introduce FSCSOS, secondly 
> resolve the specific storage amplification issue on FSCSOS, and lastly add an 
> option to reuse FSCSOS across multiple checkpoints to further reduce the DFS 
> file number.
> More details please refer to the attached design doc.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to