[jira] [Commented] (FLINK-35853) Regression in checkpoint size when performing full checkpointing in RocksDB

Jinzhong Li (Jira) Thu, 18 Jul 2024 19:45:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-35853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867159#comment-17867159
 ]


Jinzhong Li commented on FLINK-35853:
-------------------------------------

[~leekeiabstraction] 

I think this is due to the change in the way RocksDBStateBackend build full 
checkpoint introduced by FLINK-28699.

 

Before FLINK-28699, RocksDBStateBackend would iterate all the key-value pairs, 
writing each pair to the checkpoint file. 

After FLINK-28699, RocksDBStateBackend uses Rocksdb's native checkpoint method, 
dumping all the SST files and then uploading them to DFS.

 

And Rocksdb is designed by the LSM-tree architecture. When space amplification 
of sst file is significant, the checkpoint size gap between these two ways will 
become larger; conversely, it will be smaller when the space amplification is 
less pronounced.

 

> Regression in checkpoint size when performing full checkpointing in RocksDB
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-35853
>                 URL: https://issues.apache.org/jira/browse/FLINK-35853
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 1.18.1
>         Environment: amazon-linux-2023
>            Reporter: Keith Lee
>            Priority: Major
>         Attachments: StaticStateSizeGenerator115.java, 
> StaticStateSizeGenerator118.java
>
>
> We have a job with small and static state size (states are updated instead of 
> added), the job is configured to use RocksDB + full checkpointng (incremental 
> disabled) because the diff between checkpoint is larger than full checkpoint 
> size.
> After migrating to 1.18, we observed significant and steady increase in full 
> checkpoint size with RocksDB + full checkpointing. The increase was not 
> observed with hashmap state backend.
> I managed to reproduce the issue with following code:
> [^StaticStateSizeGenerator115.java]
> [^StaticStateSizeGenerator118.java]
> Result:
> On Flink 1.15, RocksDB + full checkpointing, checkpoint size is constant at 
> 250KiB.
> On Flink 1.18, RocksDB + full checkpointing, max checkpoint size got up to 
> 38MiB before dropping (presumably due to compaction?)
> On Flink 1.18, Hashmap statebackend, checkpoint size is constant at 219KiB.
> Notes:
> One observation I have is that the issue is more pronounced with higher 
> parallelism, the code uses 8 parallelism. The production application that we 
> first saw the regression got up to GiB of checkpoint size, where we only 
> expected and observed (in 1.15) at most a couple of MiB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35853) Regression in checkpoint size when performing full checkpointing in RocksDB

Reply via email to