[jira] [Commented] (FLINK-34050) Rocksdb state has space amplification after rescaling with DeleteRange

Jinzhong Li (Jira) Sun, 14 Jan 2024 18:50:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-34050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806599#comment-17806599
 ]


Jinzhong Li commented on FLINK-34050:
-------------------------------------

Thanks for your reply. [~mayuehappy]   [~masteryhx] 

IMO, it is unreasonable that redundant data can't be cleaned up for a long time 
after rescaling. Especially in scenarios where disk space is very tight, this 
behavior is a major drawback.

I agree with that deleteRange+deleteFilesInRanges could be a good default 
behaviors. 

As for the  performance check about deleteRange+deleteFilesInRanges vs 
deleteRange, i think the rescaling-state-benchmark should  satisfy this [1].

WDYT？ 

 

[1] 
https://github.com/apache/flink-benchmarks/blob/master/src/main/java/org/apache/flink/state/benchmark/RescalingBenchmarkBase.java

> Rocksdb state has space amplification after rescaling with DeleteRange
> ----------------------------------------------------------------------
>
>                 Key: FLINK-34050
>                 URL: https://issues.apache.org/jira/browse/FLINK-34050
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>            Reporter: Jinzhong Li
>            Priority: Major
>         Attachments: image-2024-01-10-21-23-48-134.png, 
> image-2024-01-10-21-24-10-983.png, image-2024-01-10-21-28-24-312.png
>
>
> FLINK-21321 use deleteRange to speed up rocksdb rescaling, however it will 
> cause space amplification in some case.
> We can reproduce this problem using wordCount job:
> 1) before rescaling, state operator in wordCount job has 2 parallelism and 
> 4G+ full checkpoint size;
> !image-2024-01-10-21-24-10-983.png|width=266,height=130!
> 2) then restart job with 4 parallelism (for state operator),  the full 
> checkpoint size of new job will be 8G+ ;
> 3) after many successful checkpoints, the full checkpoint size is still 8G+;
> !image-2024-01-10-21-28-24-312.png|width=454,height=111!
>  
> The root cause of this issue is that the deleted keyGroupRange does not 
> overlap with current DB keyGroupRange, so new data written into rocksdb after 
> rescaling almost never do LSM compaction with the deleted data (belonging to 
> other keyGroupRange.)
>  
> And the space amplification may affect Rocksdb read performance and disk 
> space usage after rescaling. It looks like a regression due to the 
> introduction of deleteRange for rescaling optimization.
>  
> To slove this problem, I think maybe we can invoke 
> Rocksdb.deleteFilesInRanges after deleteRange?
> {code:java}
> public static void clipDBWithKeyGroupRange() {
>   //.......
>   List<byte[]> ranges = new ArrayList<>();
>   //.......
>   deleteRange(db, columnFamilyHandles, beginKeyGroupBytes, endKeyGroupBytes);
>   ranges.add(beginKeyGroupBytes);
>   ranges.add(endKeyGroupBytes);
>   //....
>   for (ColumnFamilyHandle columnFamilyHandle : columnFamilyHandles) {
>      db.deleteFilesInRanges(columnFamilyHandle, ranges, false);
>   }
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34050) Rocksdb state has space amplification after rescaling with DeleteRange

Reply via email to