[
https://issues.apache.org/jira/browse/FLINK-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497788#comment-16497788
]
ASF GitHub Bot commented on FLINK-8790:
---------------------------------------
Github user StefanRRichter commented on a diff in the pull request:
https://github.com/apache/flink/pull/5582#discussion_r192347037
--- Diff:
flink-state-backends/flink-statebackend-rocksdb/src/test/java/org/apache/flink/contrib/streaming/state/RocksDBStateBackendTest.java
---
@@ -547,4 +549,30 @@ public boolean accept(File file, String s) {
return true;
}
}
+
+ private static class TestRocksDBStateBackend extends
RocksDBStateBackend {
+
+ public TestRocksDBStateBackend(AbstractStateBackend
checkpointStreamBackend, boolean enableIncrementalCheckpointing) {
+ super(checkpointStreamBackend,
enableIncrementalCheckpointing);
+ }
+
+ @Override
+ public <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(
+ Environment env,
+ JobID jobID,
+ String operatorIdentifier,
+ TypeSerializer<K> keySerializer,
+ int numberOfKeyGroups,
+ KeyGroupRange keyGroupRange,
+ TaskKvStateRegistry kvStateRegistry) throws IOException
{
+
+ AbstractKeyedStateBackend<K> keyedStateBackend =
super.createKeyedStateBackend(
+ env, jobID, operatorIdentifier, keySerializer,
numberOfKeyGroups, keyGroupRange, kvStateRegistry);
+
+ // We ignore the range deletions on production, but
when we are running the tests we shouldn't ignore it.
--- End diff --
Yes, I think that makes sense. My first question was if you think that
using normal deletes would be prohibitive expensive. I think there is some
threshold where restore+single deletes is cheaper than no restore+filtered
inserts. So the question is, for up to how much of the key group fraction which
strategy is the best. Do you think the single deletes are just suitable to
remove only like 1-2 key-groups or up to which point will this be better than
starting with an empty database?
> Improve performance for recovery from incremental checkpoint
> ------------------------------------------------------------
>
> Key: FLINK-8790
> URL: https://issues.apache.org/jira/browse/FLINK-8790
> Project: Flink
> Issue Type: Improvement
> Components: State Backends, Checkpointing
> Affects Versions: 1.5.0
> Reporter: Sihua Zhou
> Assignee: Sihua Zhou
> Priority: Major
> Fix For: 1.6.0
>
>
> When there are multi state handle to be restored, we can improve the
> performance as follow:
> 1. Choose the best state handle to init the target db
> 2. Use the other state handles to create temp db, and clip the db according
> to the target key group range (via rocksdb.deleteRange()), this can help use
> get rid of the `key group check` in
> `data insertion loop` and also help us get rid of traversing the useless
> record.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)