[
https://issues.apache.org/jira/browse/FLINK-11937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918258#comment-16918258
]
Yu Li commented on FLINK-11937:
-------------------------------
Thanks for the message and details about your in-production case [~hexiaoqiao].
And yes we will continue pushing this forward and try our best to make it in
ASAP.
[~aljoscha] [~tzulitai] [~StephanEwen] FYI.
> Resolve small file problem in RocksDB incremental checkpoint
> ------------------------------------------------------------
>
> Key: FLINK-11937
> URL: https://issues.apache.org/jira/browse/FLINK-11937
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing
> Reporter: Congxian Qiu(klion26)
> Assignee: Congxian Qiu(klion26)
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.10.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently when incremental checkpoint is enabled in RocksDBStateBackend a
> separate file will be generated on DFS for each sst file. This may cause
> “file flood” when running intensive workload (many jobs with high
> parallelism) in big cluster. According to our observation in Alibaba
> production, such file flood introduces at lease two drawbacks when using HDFS
> as the checkpoint storage FileSystem: 1) huge number of RPC request issued to
> NN which may burst its response queue; 2) huge number of files causes big
> pressure on NN’s on-heap memory.
> In Flink we ever noticed similar small file flood problem and tried to
> resolved it by introducing ByteStreamStateHandle(FLINK-2808), but this
> solution has its limitation that if we configure the threshold too low there
> will still be too many small files, while if too high the JM will finally
> OOM, thus could hardly resolve the issue in case of using RocksDBStateBackend
> with incremental snapshot strategy.
> We propose a new OutputStream called
> FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS
> will reuse the same underlying distributed file until its size exceeds a
> preset threshold. We
> plan to complete the work in 3 steps: firstly introduce FSCSOS, secondly
> resolve the specific storage amplification issue on FSCSOS, and lastly add an
> option to reuse FSCSOS across multiple checkpoints to further reduce the DFS
> file number.
> More details please refer to the attached design doc.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)