[
https://issues.apache.org/jira/browse/FLINK-11937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-11937:
-----------------------------------
Labels: pull-request-available stale-assigned (was: pull-request-available)
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issue is assigned but has not
received an update in 14, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a
comment updating the community on your progress. If this issue is waiting on
feedback, please consider this a reminder to the committer/reviewer. Flink is a
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone
else may work on it. If the "warning_label" label is not removed in 7 days, the
issue will be automatically unassigned.
> Resolve small file problem in RocksDB incremental checkpoint
> ------------------------------------------------------------
>
> Key: FLINK-11937
> URL: https://issues.apache.org/jira/browse/FLINK-11937
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing
> Reporter: Congxian Qiu
> Assignee: Congxian Qiu
> Priority: Major
> Labels: pull-request-available, stale-assigned
> Fix For: 1.14.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently when incremental checkpoint is enabled in RocksDBStateBackend a
> separate file will be generated on DFS for each sst file. This may cause
> “file flood” when running intensive workload (many jobs with high
> parallelism) in big cluster. According to our observation in Alibaba
> production, such file flood introduces at lease two drawbacks when using HDFS
> as the checkpoint storage FileSystem: 1) huge number of RPC request issued to
> NN which may burst its response queue; 2) huge number of files causes big
> pressure on NN’s on-heap memory.
> In Flink we ever noticed similar small file flood problem and tried to
> resolved it by introducing ByteStreamStateHandle(FLINK-2808), but this
> solution has its limitation that if we configure the threshold too low there
> will still be too many small files, while if too high the JM will finally
> OOM, thus could hardly resolve the issue in case of using RocksDBStateBackend
> with incremental snapshot strategy.
> We propose a new OutputStream called
> FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS
> will reuse the same underlying distributed file until its size exceeds a
> preset threshold. We
> plan to complete the work in 3 steps: firstly introduce FSCSOS, secondly
> resolve the specific storage amplification issue on FSCSOS, and lastly add an
> option to reuse FSCSOS across multiple checkpoints to further reduce the DFS
> file number.
> More details please refer to the attached design doc.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)