GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/3522
[FLINK-5823] [checkpoints] State Backends also handle Checkpoint Metadata
and introduce Global Cleanup Hooks
## Core Changes
### Store Metadata in State Backend
The state backend is now responsible for storing the checkpoint metadata.
There is no implicit assumption that the checkpoint metadata is stored in a
file systems any more.
- All checkpoint directory / savepoint directory specific config settings
are now part of the state backends. The Checkpoint Coordinator simply calls the
relevant methods on the state backends to store metadata.
- Similar as the `CheckpointStreamFactory` for storing checkpoint state,
there is now a `CheckpointMetadataStreamFactory` for the metadata.
- State backends are not required to be able to persist metadata - only
ifor HA setups and when externalized checkpoints are requested.
- All checkpoints with persisted metadata are addressable via a
"pointer", which is state-backend specific. For File-system based statebackends
(all statebackends in Flink currently), this pointer is the file path.
### Global cleanup hooks
State backends can implement an extended interface to create global cleanup
hooks. For example for a file system, a global cleanup hook simply recursively
deletes the checkpoint directory, which is for most file systems much faster
than issuing a delete call per file.
The `MemoryStateBackend` and `FsStateBackend` use fast cleanup hooks, the
`RocksDBStateBackend` should get then in a followup (see below).
### Application-defined State Backends pick up additional values from the
configuration
We need to keep supporting the scenario of setting a state backends in the
user program, but configuring parameters like checkpoint directory in the
cluster config. To support that, state backends may implement an additional
interface which lets them pick up configuration values from the cluster
configuration.
## Altered user-facing Behavior
- Externalized checkpoints store all files (state and metadata) strictly
in the same directory now. (Savepoints were contained in a single directory
already before)
- Both savepoint commands and configuration parameters now require
qualified URIs as well (i.e., `file:///path/do/savepoint`, whereas before the
configs and command line also excepted `/path/do/savepoint`.
Because not having qualified URIs is error-prone anyways (auto fallback
to local file system) I am actually in favor of doing this change.
## Tests
This adds a lot of tests, which can due to the changed design be done
completely on the state backends,
without instantiating a CheckpointCoordinator.
- Checkpoint / Savepoint delete with global hook works as expected
- Interaction of old cleanup logic and optional global cleanup hook
- State backends are properly loaded from configuration
- Application-configured State backends properly pick up additional
configuration values
## Followups
- Implement the global hooks for the RocksDB state backend. Because the
RocksDB state backend internally delegates to a "storage backend", this need to
few extra tricks and was not done in this already large pull request.
- The HA checkpoint store needs not store checkpoint metadata itself any
more, it can simply store the pointer.
- This abstraction allows state backends to over periodic cleanup hooks
that can search for lost state (files that are not referenced any more), which
can happen when TaskManager / JobManager processes fail during finalizing a
checkpoint or when handing over state between JobManager / TaskManager.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink filestatebackend
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3522.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3522
----
commit 2bd817286f89693af4bb98504091afc1f45749ad
Author: Stephan Ewen <[email protected]>
Date: 2017-03-01T22:00:55Z
[FLINK-5823] [checkpoints] State Backends also handle Checkpoint Metadata
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---