[
https://issues.apache.org/jira/browse/FLINK-34050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806599#comment-17806599
]
Jinzhong Li commented on FLINK-34050:
-------------------------------------
Thanks for your reply. [~mayuehappy] [~masteryhx]
IMO, it is unreasonable that redundant data can't be cleaned up for a long time
after rescaling. Especially in scenarios where disk space is very tight, this
behavior is a major drawback.
I agree with that deleteRange+deleteFilesInRanges could be a good default
behaviors.
As for the performance check about deleteRange+deleteFilesInRanges vs
deleteRange, i think the rescaling-state-benchmark should satisfy this [1].
WDYT?
[1]
https://github.com/apache/flink-benchmarks/blob/master/src/main/java/org/apache/flink/state/benchmark/RescalingBenchmarkBase.java
> Rocksdb state has space amplification after rescaling with DeleteRange
> ----------------------------------------------------------------------
>
> Key: FLINK-34050
> URL: https://issues.apache.org/jira/browse/FLINK-34050
> Project: Flink
> Issue Type: Bug
> Components: Runtime / State Backends
> Reporter: Jinzhong Li
> Priority: Major
> Attachments: image-2024-01-10-21-23-48-134.png,
> image-2024-01-10-21-24-10-983.png, image-2024-01-10-21-28-24-312.png
>
>
> FLINK-21321 use deleteRange to speed up rocksdb rescaling, however it will
> cause space amplification in some case.
> We can reproduce this problem using wordCount job:
> 1) before rescaling, state operator in wordCount job has 2 parallelism and
> 4G+ full checkpoint size;
> !image-2024-01-10-21-24-10-983.png|width=266,height=130!
> 2) then restart job with 4 parallelism (for state operator), the full
> checkpoint size of new job will be 8G+ ;
> 3) after many successful checkpoints, the full checkpoint size is still 8G+;
> !image-2024-01-10-21-28-24-312.png|width=454,height=111!
>
> The root cause of this issue is that the deleted keyGroupRange does not
> overlap with current DB keyGroupRange, so new data written into rocksdb after
> rescaling almost never do LSM compaction with the deleted data (belonging to
> other keyGroupRange.)
>
> And the space amplification may affect Rocksdb read performance and disk
> space usage after rescaling. It looks like a regression due to the
> introduction of deleteRange for rescaling optimization.
>
> To slove this problem, I think maybe we can invoke
> Rocksdb.deleteFilesInRanges after deleteRange?
> {code:java}
> public static void clipDBWithKeyGroupRange() {
> //.......
> List<byte[]> ranges = new ArrayList<>();
> //.......
> deleteRange(db, columnFamilyHandles, beginKeyGroupBytes, endKeyGroupBytes);
> ranges.add(beginKeyGroupBytes);
> ranges.add(endKeyGroupBytes);
> //....
> for (ColumnFamilyHandle columnFamilyHandle : columnFamilyHandles) {
> db.deleteFilesInRanges(columnFamilyHandle, ranges, false);
> }
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)