GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/6395

    [FLINK-9900][tests] Harden ZooKeeperHighAvailabilityITCase

    ## What is the purpose of the change
    
    This PR makes a few modifications to the `ZooKeeperHighAvailabilityITCase` 
to reduce the chances for intermittent test failures and timeouts.
    
    Changes:
    ## 1)
    The test was moving files out of the HA storage directory with a simple 
loop using `File#renameTo`. The test enforced that the moving is successful, 
however since old checkpoints may be deleted asynchronously this may not always 
be the case.
    We now use a `FileVisitor` and ignore `IOExceptions` that occur while 
moving.
    If no checkpoint file could be moved the test will still fail.
    
    ## 2)
    After the checkpoint files were moved out of the HA storage directory the 
job is thrown into a restart loop. To verify the restart behavior the test was 
polling the job state and checked for the `RESTARTING` and `FAILING` states.
    Due to the small size the job is in these states only for a short time, 
effectively adding a race condition. Thus this loop mayrun for longer than 
anticipated; the largest outlier i got locally was 50 seconds which isn't 
_that_ for off from the 2 minute timeout. I suspect this to be the failure 
cause raised in the JIRA, but I can't guarantee it.
    Instead we now access the `fullRestarts` metric using a custom reporter to 
check how many restarts have occurred. The actual _state transitions_ should be 
irrelevant to the test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink 9900

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6395
    
----
commit b8827dc3723558c52ad567bf88f24ae34129ea08
Author: zentol <chesnay@...>
Date:   2018-07-23T14:21:32Z

    [FLINK-9900][tests] Harden ZooKeeperHighAvailabilityITCase

----


---

Reply via email to