[
https://issues.apache.org/jira/browse/FLINK-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356449#comment-16356449
]
ASF GitHub Bot commented on FLINK-8533:
---------------------------------------
GitHub user EronWright opened a pull request:
https://github.com/apache/flink/pull/5427
[FLINK-8533] [checkpointing] Support MasterTriggerRestoreHook state
reinitialization
Signed-off-by: Eron Wright <[email protected]>
## What is the purpose of the change
Support MasterTriggerRestoreHook state re-initialization, to eliminate an
edge case involving execution restarts where no checkpoint state is available.
## Brief change log
- extend `MasterTriggerRestoreHook` with `initializeState` method.
- invoke `initializeState` upon initial execution and upon global restart.
## Verifying this change
- Revised test: `CheckpointCoordinatorMasterHooksTest`
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: **yes**
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: **yes**
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? no
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/EronWright/flink
FLINK-8533-hook-initialization
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5427.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5427
----
commit 9ad0e03c1aa81012ae14f598acdcf3eb76c9ec9f
Author: Eron Wright <eronwright@...>
Date: 2018-02-08T03:38:47Z
[FLINK-8533] Support MasterTriggerRestoreHook state reinitialization
- extend `MasterTriggerRestoreHook` with `initializeState` method.
- invoke `initializeState` upon initial execution and upon global
restart.
Signed-off-by: Eron Wright <[email protected]>
----
> Support MasterTriggerRestoreHook state reinitialization
> -------------------------------------------------------
>
> Key: FLINK-8533
> URL: https://issues.apache.org/jira/browse/FLINK-8533
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Affects Versions: 1.3.0
> Reporter: Eron Wright
> Assignee: Eron Wright
> Priority: Major
>
> {{MasterTriggerRestoreHook}} enables coordination with an external system for
> taking or restoring checkpoints. When execution is restarted from a
> checkpoint, {{restoreCheckpoint}} is called to restore or reinitialize the
> external system state. There's an edge case where the external state is not
> adequately reinitialized, that is when execution fails _before the first
> checkpoint_. In that case, the hook is not invoked and has no opportunity to
> restore the external state to initial conditions.
> The impact is a loss of exactly-once semantics in this case. For example, in
> the Pravega source function, the reader group state (e.g. stream position
> data) is stored externally. In the normal restore case, the reader group
> state is forcibly rewound to the checkpointed position. In the edge case
> where no checkpoint has yet been successful, the reader group state is not
> rewound and consequently some amount of stream data is not reprocessed.
> A possible fix would be to introduce an {{initializeState}} method on the
> hook interface. Similar to {{CheckpointedFunction::initializeState}}, this
> method would be invoked unconditionally upon hook initialization. The Pravega
> hook would, for example, initialize or forcibly reinitialize the reader group
> state.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)