[
https://issues.apache.org/jira/browse/FLINK-35853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867159#comment-17867159
]
Jinzhong Li commented on FLINK-35853:
-------------------------------------
[~leekeiabstraction]
I think this is due to the change in the way RocksDBStateBackend build full
checkpoint introduced by FLINK-28699.
Before FLINK-28699, RocksDBStateBackend would iterate all the key-value pairs,
writing each pair to the checkpoint file.
After FLINK-28699, RocksDBStateBackend uses Rocksdb's native checkpoint method,
dumping all the SST files and then uploading them to DFS.
And Rocksdb is designed by the LSM-tree architecture. When space amplification
of sst file is significant, the checkpoint size gap between these two ways will
become larger; conversely, it will be smaller when the space amplification is
less pronounced.
> Regression in checkpoint size when performing full checkpointing in RocksDB
> ---------------------------------------------------------------------------
>
> Key: FLINK-35853
> URL: https://issues.apache.org/jira/browse/FLINK-35853
> Project: Flink
> Issue Type: Bug
> Components: Runtime / State Backends
> Affects Versions: 1.18.1
> Environment: amazon-linux-2023
> Reporter: Keith Lee
> Priority: Major
> Attachments: StaticStateSizeGenerator115.java,
> StaticStateSizeGenerator118.java
>
>
> We have a job with small and static state size (states are updated instead of
> added), the job is configured to use RocksDB + full checkpointng (incremental
> disabled) because the diff between checkpoint is larger than full checkpoint
> size.
> After migrating to 1.18, we observed significant and steady increase in full
> checkpoint size with RocksDB + full checkpointing. The increase was not
> observed with hashmap state backend.
> I managed to reproduce the issue with following code:
> [^StaticStateSizeGenerator115.java]
> [^StaticStateSizeGenerator118.java]
> Result:
> On Flink 1.15, RocksDB + full checkpointing, checkpoint size is constant at
> 250KiB.
> On Flink 1.18, RocksDB + full checkpointing, max checkpoint size got up to
> 38MiB before dropping (presumably due to compaction?)
> On Flink 1.18, Hashmap statebackend, checkpoint size is constant at 219KiB.
> Notes:
> One observation I have is that the issue is more pronounced with higher
> parallelism, the code uses 8 parallelism. The production application that we
> first saw the regression got up to GiB of checkpoint size, where we only
> expected and observed (in 1.15) at most a couple of MiB.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)