[jira] [Issue Comment Deleted] (FLINK-10724) Refactor failure handling in check point coordinator

vinoyang (JIRA) Mon, 10 Dec 2018 07:49:21 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


vinoyang updated FLINK-10724:
-----------------------------
    Comment: was deleted

(was: The main failure reasons list below:
{code:java}
CheckpointExpired(“Checkpoint expired before completing”)
CheckpointSubsumed(“Checkpoint has been subsumed”)
CheckpointDeclined(“Checkpoint was declined (tasks not ready)”)
CheckpointError(“Checkpoint failed”)
{code}
They could be defined as some enum values in {{CheckpointFailureReason}}.

Like {{CheckpointTriggerResult}}, I also suggest that we could introduce a 
class, for example, named {{CheckpointInvokeResult}} which contains 
{{CheckpointFailureReason}} and represents the invoke result.

Considering when we count the number of failures, we want to contain the 
trigger result of savepoint. The {{CheckpointFailureManager}} will response 
both {{CheckpointTriggerResult}} and {{CheckpointInvokeResult}}.

What do you think? [~azagrebin] and [~till.rohrmann]

 )

> Refactor failure handling in check point coordinator
> ----------------------------------------------------
>
>                 Key: FLINK-10724
>                 URL: https://issues.apache.org/jira/browse/FLINK-10724
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>            Reporter: Andrey Zagrebin
>            Assignee: vinoyang
>            Priority: Major
>
> At the moment failure handling of asynchronously triggered checkpoint in 
> check point coordinator happens in different places. We could organise it 
> similar way as failure handling of synchronous triggering of checkpoint in 
> *CheckpointTriggerResult* where we classify error cases. This will simplify 
> e.g. integration of error counter for FLINK-10074.
> See also discussion here: [https://github.com/apache/flink/pull/6567]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Issue Comment Deleted] (FLINK-10724) Refactor failure handling in check point coordinator

Reply via email to