[
https://issues.apache.org/jira/browse/FLINK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
vinoyang updated FLINK-10724:
-----------------------------
Comment: was deleted
(was: The main failure reasons list below:
{code:java}
CheckpointExpired(“Checkpoint expired before completing”)
CheckpointSubsumed(“Checkpoint has been subsumed”)
CheckpointDeclined(“Checkpoint was declined (tasks not ready)”)
CheckpointError(“Checkpoint failed”)
{code}
They could be defined as some enum values in {{CheckpointFailureReason}}.
Like {{CheckpointTriggerResult}}, I also suggest that we could introduce a
class, for example, named {{CheckpointInvokeResult}} which contains
{{CheckpointFailureReason}} and represents the invoke result.
Considering when we count the number of failures, we want to contain the
trigger result of savepoint. The {{CheckpointFailureManager}} will response
both {{CheckpointTriggerResult}} and {{CheckpointInvokeResult}}.
What do you think? [~azagrebin] and [~till.rohrmann]
)
> Refactor failure handling in check point coordinator
> ----------------------------------------------------
>
> Key: FLINK-10724
> URL: https://issues.apache.org/jira/browse/FLINK-10724
> Project: Flink
> Issue Type: Improvement
> Components: State Backends, Checkpointing
> Reporter: Andrey Zagrebin
> Assignee: vinoyang
> Priority: Major
>
> At the moment failure handling of asynchronously triggered checkpoint in
> check point coordinator happens in different places. We could organise it
> similar way as failure handling of synchronous triggering of checkpoint in
> *CheckpointTriggerResult* where we classify error cases. This will simplify
> e.g. integration of error counter for FLINK-10074.
> See also discussion here: [https://github.com/apache/flink/pull/6567]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)