[
https://issues.apache.org/jira/browse/FLINK-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aljoscha Krettek updated FLINK-12751:
-------------------------------------
Component/s: (was: FileSystems)
Runtime / Coordination
Runtime / Checkpointing
> Create file based HA support
> ----------------------------
>
> Key: FLINK-12751
> URL: https://issues.apache.org/jira/browse/FLINK-12751
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing, Runtime / Coordination
> Affects Versions: 1.8.0, 1.9.0, 2.0.0
> Environment: Flink on k8 and Mini cluster
> Reporter: Boris Lublinsky
> Priority: Major
> Labels: features, pull-request-available
> Original Estimate: 168h
> Time Spent: 10m
> Remaining Estimate: 167h 50m
>
> In the current Flink implementation, HA support can be implemented either
> using Zookeeper or Custom Factory class.
> Add HA implementation based on PVC. The idea behind this implementation
> is as follows:
> * Because implementation assumes a single instance of Job manager (Job
> manager selection and restarts are done by K8 Deployment of 1)
> URL management is done using StandaloneHaServices implementation (in the case
> of cluster) and EmbeddedHaServices implementation (in the case of mini
> cluster)
> * For management of the submitted Job Graphs, checkpoint counter and
> completed checkpoint an implementation is leveraging the following file
> system layout
> {code}
> ha -----> root of the HA data
> checkpointcounter -----> checkpoint counter folder
> <job ID> -----> job id folder
> <counter file> -----> counter file
> <another job ID> -----> another job id folder
> ...........
> completedCheckpoint -----> completed checkpoint folder
> <job ID> -----> job id folder
> <checkpoint file> -----> checkpoint file
> <another checkpoint file> -----> checkpoint file
> ...........
> <another job ID> -----> another job id folder
> ...........
> submittedJobGraph -----> submitted graph folder
> <job ID> -----> job id folder
> <graph file> -----> graph file
> <another job ID> -----> another job id folder
> ...........
> {code}
> An implementation should overwrites 2 of the Flink files:
> * HighAvailabilityServicesUtils - added `FILESYSTEM` option for picking HA
> service
> * HighAvailabilityMode - added `FILESYSTEM` to available HA options.
> The actual implementation adds the following classes:
> * `FileSystemHAServices` - an implementation of a `HighAvailabilityServices`
> for file system
> * `FileSystemUtils` - support class for creation of runtime components.
> * `FileSystemStorageHelper` - file system operations implementation for
> filesystem based HA
> * `FileSystemCheckpointRecoveryFactory` - an implementation of a
> `CheckpointRecoveryFactory`for file system
> * `FileSystemCheckpointIDCounter` - an implementation of a
> `CheckpointIDCounter` for file system
> * `FileSystemCompletedCheckpointStore` - an implementation of a
> `CompletedCheckpointStore` for file system
> * `FileSystemSubmittedJobGraphStore` - an implementation of a
> `SubmittedJobGraphStore` for file system
--
This message was sent by Atlassian Jira
(v8.3.4#803005)