[jira] [Commented] (FLINK-9114) Enable user-provided, custom CheckpointRecoveryFactory for non-HA deployments

Jacob Park (JIRA) Fri, 06 Apr 2018 09:52:29 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428587#comment-16428587
 ]


Jacob Park commented on FLINK-9114:
-----------------------------------

I was thinking of creating a ConfigurableCheckpointRecoveryFactory interface 
with a configure(Configuration config, Executor executor) method instead of a 
constructor for ease of class-loading and reflection. It would also exist in a 
separate package to prevent issues with class-loading conflicts when building a 
JAR (like flink-metrics).

The ConfigurableCheckpointRecoveryFactory will be instantiated with the help of 
a ConfigurableCheckpointRecoveryFactoryLoader invoked in a new abstract class 
that subclasses HighAvailabilityServices and overrides 
getCheckpointRecoveryFactory(). This new abstract class would be the parent 
class for YarnHighAvailabilityServices, StandaloneHaServices, and 
EmbeddedHaServices.

I hope this approach won't be too invasive for the existing 
StandaloneCheckpointRecoveryFactory as the configure() method will be a no-op 
for it, and it would not impact how a JobManager utilizes the 
CheckpointRecoveryFactory to create CompletedCheckpointStore in the 
ExecutionGraph.

> Enable user-provided, custom CheckpointRecoveryFactory for non-HA deployments
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-9114
>                 URL: https://issues.apache.org/jira/browse/FLINK-9114
>             Project: Flink
>          Issue Type: Improvement
>          Components: Configuration, State Backends, Checkpointing
>            Reporter: Jacob Park
>            Assignee: Jacob Park
>            Priority: Major
>
> When you operate a Flink application that uses externalized checkpoints to 
> S3, it becomes difficult to determine which checkpoint is the latest to 
> recover from. Because S3 provides read-after-write consistency only for PUTS, 
> listing a S3 path is not guaranteed to be consistent, so we do not know what 
> checkpoint to recover from.
> The goal of this improvement is to allow users to provide a custom 
> CheckpointRecoveryFactory for non-HA deployments such that we can use this 
> feature to fail checkpoints if we cannot guarantee we will know where a 
> checkpoint will be in S3, and co-publish checkpoint metadata to a strongly 
> consistent data store.
> I propose the following changes:
>  # Modify AbstractNonHaServices and StandaloneHaServices to accept an 
> Executor for the custom CheckpointRecoveryFactory.
>  # Create a CheckpointRecoveryFactoryLoader to provide the custom 
> CheckpointRecoveryFactory from configurations.
>  # Add new configurations for this feature.
> We considered the pluggable StateBackend and potential pluggable 
> HighAvailabilityServices. These were too convoluted to solve our problem, so 
> we would like custom CheckpointRecoveryFactory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9114) Enable user-provided, custom CheckpointRecoveryFactory for non-HA deployments

Reply via email to