[jira] [Updated] (FLINK-18336) CheckpointFailureManager forgets failed checkpoints after a successful one

Roman Khachatryan (Jira) Thu, 18 Jun 2020 01:00:22 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-18336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Roman Khachatryan updated FLINK-18336:
--------------------------------------
    Description: 
To my understanding, failure shouldn't be counted more than once for a single 
checkpoint.

However, after a successful checkpoint, all previous failures are cleared.

So this test will currently fail:

 
{code:java}
TestFailJobCallback callback = new TestFailJobCallback();
CheckpointFailureManager failureManager = new CheckpointFailureManager(2, 
callback);

failureManager.handleJobLevelCheckpointException(new 
CheckpointException(CHECKPOINT_EXPIRED), 1L);
failureManager.handleJobLevelCheckpointException(new 
CheckpointException(CHECKPOINT_EXPIRED), 2L);

failureManager.handleCheckpointSuccess(2L);
failureManager.handleJobLevelCheckpointException(new 
CheckpointException(CHECKPOINT_EXPIRED), 3L);
failureManager.handleJobLevelCheckpointException(new 
CheckpointException(CHECKPOINT_EXPIRED), 4L);

// shouldn't be counted because 1L has already failed:
failureManager.handleJobLevelCheckpointException(new 
CheckpointException(CHECKPOINT_EXPIRED), 1L); 

assertEquals(0, callback.getInvokeCounter());{code}
 

> CheckpointFailureManager forgets failed checkpoints after a successful one
> --------------------------------------------------------------------------
>
>                 Key: FLINK-18336
>                 URL: https://issues.apache.org/jira/browse/FLINK-18336
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>            Reporter: Roman Khachatryan
>            Assignee: Roman Khachatryan
>            Priority: Major
>              Labels: pull-request-available
>
> To my understanding, failure shouldn't be counted more than once for a single 
> checkpoint.
> However, after a successful checkpoint, all previous failures are cleared.
> So this test will currently fail:
>  
> {code:java}
> TestFailJobCallback callback = new TestFailJobCallback();
> CheckpointFailureManager failureManager = new CheckpointFailureManager(2, 
> callback);
> failureManager.handleJobLevelCheckpointException(new 
> CheckpointException(CHECKPOINT_EXPIRED), 1L);
> failureManager.handleJobLevelCheckpointException(new 
> CheckpointException(CHECKPOINT_EXPIRED), 2L);
> failureManager.handleCheckpointSuccess(2L);
> failureManager.handleJobLevelCheckpointException(new 
> CheckpointException(CHECKPOINT_EXPIRED), 3L);
> failureManager.handleJobLevelCheckpointException(new 
> CheckpointException(CHECKPOINT_EXPIRED), 4L);
> // shouldn't be counted because 1L has already failed:
> failureManager.handleJobLevelCheckpointException(new 
> CheckpointException(CHECKPOINT_EXPIRED), 1L); 
> assertEquals(0, callback.getInvokeCounter());{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18336) CheckpointFailureManager forgets failed checkpoints after a successful one

Reply via email to