[
https://issues.apache.org/jira/browse/FLINK-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890214#comment-15890214
]
ASF GitHub Bot commented on FLINK-5778:
---------------------------------------
GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/3442
[FLINK-5778] [savepoints] Add savepoint serializer with relative file path
serialization
This adds a new savepoint version, `SavepointV2`. The corresponding
`SavepointV2Serializer` is the same as our current `SavepointV1Serializer`
except that `FileStateHandle` instances are serialized with their file path
relative to the savepoint base path.
As an example imagine a savepoint in directory
`hdfs:///path/to/savepoint-directory` with this data file:
```
hdfs:///path/to/savepoint-directory/_metadata
hdfs:///path/to/savepoint-directory/data-X
hdfs:///path/to/savepoint-directory/data-Y
```
Previously, the complete file path was stored. With this PR, we only store
`data-X` for file state handles and reconstruct the complete path from the
savepoint directory on restore. This enables us to move the savepoint directory
around. The only requirement is that the layout within the savepoint directory
does not change. I think this is a reasonable restriction.
In addition to the added tests, I've tested this manually by triggering
savepoints, moving the savepoint around in the local file system as well as to
HDFS and restoring from it.
The code between `SavepointV1` and `SavepointV2` and the respective
serializers is mostly shared. Therefore, I've moved the base logic out to an
abstract `AbstractSavepoint` and `AbstractSavepointSerializer`.
The migration story is that you can resume old savepoints as before and all
newly triggered savepoints will be V2 savepoints that serialize file state
handles with their relative path. You can also resume with `1.3-SNAPSHOT`
savepoint without any issues.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink 5778-relocatable
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3442.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3442
----
commit 1bc3b3bff1b33eb204e8b9d4cd9589105dd60466
Author: Ufuk Celebi <[email protected]>
Date: 2017-02-28T21:36:24Z
[FLINK-5778] [savepoints] Add savepoint serializer with relative file path
serializaton
----
> Split FileStateHandle into fileName and basePath
> ------------------------------------------------
>
> Key: FLINK-5778
> URL: https://issues.apache.org/jira/browse/FLINK-5778
> Project: Flink
> Issue Type: Sub-task
> Components: State Backends, Checkpointing
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
>
> Store the statePath as a basePath and a fileName and allow to overwrite the
> basePath. We cannot overwrite the base path as long as the state handle is
> still in flight and not persisted. Otherwise we risk a resource leak.
> We need this in order to be able to relocate savepoints.
> {code}
> interface RelativeBaseLocationStreamStateHandle {
> void clearBaseLocation();
> void setBaseLocation(String baseLocation);
> }
> {code}
> FileStateHandle should implement this and the SavepointSerializer should
> forward the calls when a savepoint is stored or loaded, clear before store
> and set after load.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)