[ 
https://issues.apache.org/jira/browse/FLINK-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356449#comment-16356449
 ] 

ASF GitHub Bot commented on FLINK-8533:
---------------------------------------

GitHub user EronWright opened a pull request:

    https://github.com/apache/flink/pull/5427

    [FLINK-8533] [checkpointing] Support MasterTriggerRestoreHook state 
reinitialization

    
    Signed-off-by: Eron Wright <eronwri...@gmail.com>
    
    ## What is the purpose of the change
    Support MasterTriggerRestoreHook state re-initialization, to eliminate an 
edge case involving execution restarts where no checkpoint state is available.
    
    ## Brief change log
    - extend `MasterTriggerRestoreHook` with `initializeState` method.
    - invoke `initializeState` upon initial execution and upon global restart.
    
    ## Verifying this change
    - Revised test: `CheckpointCoordinatorMasterHooksTest`
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency):  no
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`:  **yes**
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **yes**
      - The S3 file system connector:  no
    
    ## Documentation
      - Does this pull request introduce a new feature?  no
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/EronWright/flink 
FLINK-8533-hook-initialization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5427.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5427
    
----
commit 9ad0e03c1aa81012ae14f598acdcf3eb76c9ec9f
Author: Eron Wright <eronwright@...>
Date:   2018-02-08T03:38:47Z

    [FLINK-8533] Support MasterTriggerRestoreHook state reinitialization
    
    - extend `MasterTriggerRestoreHook` with `initializeState` method.
    - invoke `initializeState` upon initial execution and upon global
    restart.
    
    Signed-off-by: Eron Wright <eronwri...@gmail.com>

----


> Support MasterTriggerRestoreHook state reinitialization
> -------------------------------------------------------
>
>                 Key: FLINK-8533
>                 URL: https://issues.apache.org/jira/browse/FLINK-8533
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.3.0
>            Reporter: Eron Wright 
>            Assignee: Eron Wright 
>            Priority: Major
>
> {{MasterTriggerRestoreHook}} enables coordination with an external system for 
> taking or restoring checkpoints. When execution is restarted from a 
> checkpoint, {{restoreCheckpoint}} is called to restore or reinitialize the 
> external system state. There's an edge case where the external state is not 
> adequately reinitialized, that is when execution fails _before the first 
> checkpoint_. In that case, the hook is not invoked and has no opportunity to 
> restore the external state to initial conditions.
> The impact is a loss of exactly-once semantics in this case. For example, in 
> the Pravega source function, the reader group state (e.g. stream position 
> data) is stored externally. In the normal restore case, the reader group 
> state is forcibly rewound to the checkpointed position. In the edge case 
> where no checkpoint has yet been successful, the reader group state is not 
> rewound and consequently some amount of stream data is not reprocessed.
> A possible fix would be to introduce an {{initializeState}} method on the 
> hook interface. Similar to {{CheckpointedFunction::initializeState}}, this 
> method would be invoked unconditionally upon hook initialization. The Pravega 
> hook would, for example, initialize or forcibly reinitialize the reader group 
> state.    



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to