[jira] [Commented] (FLINK-8790) Improve performance for recovery from incremental checkpoint

ASF GitHub Bot (JIRA) Fri, 01 Jun 2018 02:26:35 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497771#comment-16497771
 ]


ASF GitHub Bot commented on FLINK-8790:
---------------------------------------

Github user sihuazhou commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5582#discussion_r192342710
  
    --- Diff: 
flink-state-backends/flink-statebackend-rocksdb/src/test/java/org/apache/flink/contrib/streaming/state/RocksDBStateBackendTest.java
 ---
    @@ -547,4 +549,30 @@ public boolean accept(File file, String s) {
                        return true;
                }
        }
    +
    +   private static class TestRocksDBStateBackend extends 
RocksDBStateBackend {
    +
    +           public TestRocksDBStateBackend(AbstractStateBackend 
checkpointStreamBackend, boolean enableIncrementalCheckpointing) {
    +                   super(checkpointStreamBackend, 
enableIncrementalCheckpointing);
    +           }
    +
    +           @Override
    +           public <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(
    +                   Environment env,
    +                   JobID jobID,
    +                   String operatorIdentifier,
    +                   TypeSerializer<K> keySerializer,
    +                   int numberOfKeyGroups,
    +                   KeyGroupRange keyGroupRange,
    +                   TaskKvStateRegistry kvStateRegistry) throws IOException 
{
    +
    +                   AbstractKeyedStateBackend<K> keyedStateBackend = 
super.createKeyedStateBackend(
    +                           env, jobID, operatorIdentifier, keySerializer, 
numberOfKeyGroups, keyGroupRange, kvStateRegistry);
    +
    +                   // We ignore the range deletions on production, but 
when we are running the tests we shouldn't ignore it.
    --- End diff --
    
    Does it make sense to let `chooseTheBestStateHandleToInit` to return a 
non-null instance only when there's one instance's key-group is fully covered 
by the target key-group. This way we won't clip anything, and the clip related 
code can be removed away(or still retain it in case we may use it in the 
future?).


> Improve performance for recovery from incremental checkpoint
> ------------------------------------------------------------
>
>                 Key: FLINK-8790
>                 URL: https://issues.apache.org/jira/browse/FLINK-8790
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Sihua Zhou
>            Assignee: Sihua Zhou
>            Priority: Major
>             Fix For: 1.6.0
>
>
> When there are multi state handle to be restored, we can improve the 
> performance as follow:
> 1. Choose the best state handle to init the target db
> 2. Use the other state handles to create temp db, and clip the db according 
> to the target key group range (via rocksdb.deleteRange()), this can help use 
> get rid of the `key group check` in 
>  `data insertion loop` and also help us get rid of traversing the useless 
> record.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8790) Improve performance for recovery from incremental checkpoint

Reply via email to