[jira] [Commented] (FLINK-7783) Don't always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()

ASF GitHub Bot (JIRA) Thu, 19 Oct 2017 23:17:21 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212237#comment-16212237
 ]


ASF GitHub Bot commented on FLINK-7783:
---------------------------------------

Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4863#discussion_r145885278
  
    --- Diff: 
flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStoreTest.java
 ---
    @@ -183,11 +179,120 @@ public Void answer(InvocationOnMock invocation) 
throws Throwable {
                // check that we did not discard any of the state handles which 
were retrieved
                verify(retrievableStateHandle1, never()).discardState();
                verify(retrievableStateHandle2, never()).discardState();
    +   }
    +
    +   /**
    +    * Tests that the completed checkpoint store can retrieve all 
checkpoints stored in ZooKeeper
    +    * and ignores those which cannot be retrieved via their state handles.
    +    */
    +   @Test
    +   public void testCheckpointRecoveryWithBrokenCheckpoints() throws 
Exception {
    +           final List<Tuple2<RetrievableStateHandle<CompletedCheckpoint>, 
String>> checkpointsInZooKeeper = new ArrayList<>(4);
    +
    +           final CompletedCheckpoint completedCheckpoint1 = 
mock(CompletedCheckpoint.class);
    +           when(completedCheckpoint1.getCheckpointID()).thenReturn(1L);
    +           final CompletedCheckpoint completedCheckpoint2 = 
mock(CompletedCheckpoint.class);
    +           when(completedCheckpoint2.getCheckpointID()).thenReturn(2L);
    +
    +           final Collection<Long> expectedCheckpointIds = new HashSet<>(2);
    +           expectedCheckpointIds.add(1L);
    +           expectedCheckpointIds.add(2L);
    +
    +           final RetrievableStateHandle<CompletedCheckpoint> 
failingRetrievableStateHandle = mock(RetrievableStateHandle.class);
    +           
when(failingRetrievableStateHandle.retrieveState()).thenThrow(new 
IOException("Test exception"));
    +
    +           final RetrievableStateHandle<CompletedCheckpoint> 
retrievableStateHandle1 = mock(RetrievableStateHandle.class);
    +           
when(retrievableStateHandle1.retrieveState()).thenReturn(completedCheckpoint1);
    +
    +           final RetrievableStateHandle<CompletedCheckpoint> 
retrievableStateHandle2 = mock(RetrievableStateHandle.class);
    +           
when(retrievableStateHandle2.retrieveState()).thenReturn(completedCheckpoint2);
    +
    +           
checkpointsInZooKeeper.add(Tuple2.of(failingRetrievableStateHandle, 
"/a_failing1"));
    +           checkpointsInZooKeeper.add(Tuple2.of(retrievableStateHandle1, 
"/b_foobar1"));
    +           
checkpointsInZooKeeper.add(Tuple2.of(failingRetrievableStateHandle, 
"/c_failing2"));
    +           checkpointsInZooKeeper.add(Tuple2.of(retrievableStateHandle2, 
"/d_foobar2"));
    +
    +           final CuratorFramework client = mock(CuratorFramework.class, 
Mockito.RETURNS_DEEP_STUBS);
    +           final RetrievableStateStorageHelper<CompletedCheckpoint> 
storageHelperMock = mock(RetrievableStateStorageHelper.class);
    +
    +           ZooKeeperStateHandleStore<CompletedCheckpoint> 
zooKeeperStateHandleStoreMock = spy(new ZooKeeperStateHandleStore<>(client, 
storageHelperMock, Executors.directExecutor()));
    +           
whenNew(ZooKeeperStateHandleStore.class).withAnyArguments().thenReturn(zooKeeperStateHandleStoreMock);
    +           
doReturn(checkpointsInZooKeeper).when(zooKeeperStateHandleStoreMock).getAllSortedByNameAndLock();
    +
    +           final int numCheckpointsToRetain = 1;
    +
    +           // Mocking for the delete operation on the CuratorFramework 
client
    +           // It assures that the callback is executed synchronously
    +
    +           final EnsurePath ensurePathMock = mock(EnsurePath.class);
    --- End diff --
    
    I started to become very critical about excessive mocking in tests. I 
wonder if we could just have true dummy implementations because most (or all) 
the mocked objects have clear interfaces. We can then reuse the dummies in all 
test. A smaller case for this could also be `RetrievableStateHandle` that 
throws exception or returns an object, instead of doing the repetitive mocking 
thing.
    
    I had quiet some thoughts on this and even consider giving a lunchtalk 
about my view on the problem. In a nutshell, a test should verify that a 
contract works as expected and should not care about implementation details, 
e.g. how the contract is fulfilled. With mocking, we hardwire the "how". Tests 
should guard against introducing failures, so one half is showing new failure. 
But the other half is, that tests should not indicate failures as long as the 
contract is not violated and the test should be valid as is. 
    
    However, with mocks, test can fail when implementation details diverge from 
the hardwired expectation even if the contract is not violated. So half of the 
tests value is lost and the test must constantly be refactored with the code to 
get it green again, thereby also giving potential to introduce errors in the 
test that cover errors in the actual change. Repetitive mocking makes things 
even worse.


> Don't always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-7783
>                 URL: https://issues.apache.org/jira/browse/FLINK-7783
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.4.0, 1.3.2
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>            Priority: Blocker
>             Fix For: 1.4.0, 1.3.3
>
>
> Currently, we always delete checkpoint handles if they (or the data from the 
> DFS) cannot be read: 
> https://github.com/apache/flink/blob/91a4b276171afb760bfff9ccf30593e648e91dfb/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L180
> This can lead to problems in case the DFS is temporarily not available, i.e. 
> we could inadvertently
> delete all checkpoints even though they are still valid.
> A user reported this problem on the mailing list: 
> https://lists.apache.org/thread.html/9dc9b719cf8449067ad01114fedb75d1beac7b4dff171acdcc24903d@%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7783) Don't always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()

Reply via email to