[jira] [Commented] (FLINK-8790) Improve performance for recovery from incremental checkpoint

ASF GitHub Bot (JIRA) Fri, 01 Jun 2018 02:43:35 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497788#comment-16497788
 ]


ASF GitHub Bot commented on FLINK-8790:
---------------------------------------

Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5582#discussion_r192347037
  
    --- Diff: 
flink-state-backends/flink-statebackend-rocksdb/src/test/java/org/apache/flink/contrib/streaming/state/RocksDBStateBackendTest.java
 ---
    @@ -547,4 +549,30 @@ public boolean accept(File file, String s) {
                        return true;
                }
        }
    +
    +   private static class TestRocksDBStateBackend extends 
RocksDBStateBackend {
    +
    +           public TestRocksDBStateBackend(AbstractStateBackend 
checkpointStreamBackend, boolean enableIncrementalCheckpointing) {
    +                   super(checkpointStreamBackend, 
enableIncrementalCheckpointing);
    +           }
    +
    +           @Override
    +           public <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(
    +                   Environment env,
    +                   JobID jobID,
    +                   String operatorIdentifier,
    +                   TypeSerializer<K> keySerializer,
    +                   int numberOfKeyGroups,
    +                   KeyGroupRange keyGroupRange,
    +                   TaskKvStateRegistry kvStateRegistry) throws IOException 
{
    +
    +                   AbstractKeyedStateBackend<K> keyedStateBackend = 
super.createKeyedStateBackend(
    +                           env, jobID, operatorIdentifier, keySerializer, 
numberOfKeyGroups, keyGroupRange, kvStateRegistry);
    +
    +                   // We ignore the range deletions on production, but 
when we are running the tests we shouldn't ignore it.
    --- End diff --
    
    Yes, I think that makes sense. My first question was if you think that 
using normal deletes would be prohibitive expensive. I think there is some 
threshold where restore+single deletes is cheaper than no restore+filtered 
inserts. So the question is, for up to how much of the key group fraction which 
strategy is the best. Do you think the single deletes are just suitable to 
remove only like 1-2 key-groups or up to which point will this be better than 
starting with an empty database? 


> Improve performance for recovery from incremental checkpoint
> ------------------------------------------------------------
>
>                 Key: FLINK-8790
>                 URL: https://issues.apache.org/jira/browse/FLINK-8790
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Sihua Zhou
>            Assignee: Sihua Zhou
>            Priority: Major
>             Fix For: 1.6.0
>
>
> When there are multi state handle to be restored, we can improve the 
> performance as follow:
> 1. Choose the best state handle to init the target db
> 2. Use the other state handles to create temp db, and clip the db according 
> to the target key group range (via rocksdb.deleteRange()), this can help use 
> get rid of the `key group check` in 
>  `data insertion loop` and also help us get rid of traversing the useless 
> record.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8790) Improve performance for recovery from incremental checkpoint

Reply via email to