GitHub user StefanRRichter opened a pull request:

    https://github.com/apache/flink/pull/4855

    [FLINK-5372] [tests] Fix RocksDBAsyncSnapshotTest.testCancelFullyAsyn…

    …cCheckpoints()
    
    ## What is the purpose of the change
    
    This PR fixes `RocksDBAsyncSnapshotTest.testCancelFullyAsyncCheckpoints()`, 
so that the test can be reactivated.
    
    Furthermore, it also fixes some problems in the same test class related to 
unreleased native RocksDB resources (missing calls to 
`RocksDBKeyedStateBackend::dispose()`). I did a search for similar problems by 
placing log statements for instance creation and dispose calls in the log and 
ensured that there are always matching pairs. I did this for the rocksdb tests 
and flink-tests - found and fixed one more case in 
`RocksDBStateBackendConfigTest`.
    
    ## Brief change log
    
    First problem is, that the test uses a checkpoint stream factory that 
produces streams that block on latches, then wants to call close() while the 
stream is supposed to be blocked in an async thread executing and writing the 
rocksdb snapshot. While the test fails is, that the first stream is actually 
used to take s snapshot of the timer service into raw keyed state. The timer 
service is not async and therefore the test blocks and cannot reach close(). I 
will fix the test through a different factory, that gives a blocking stream on 
the second call, i.e. for the request that is actually from the keyed backend.
    For the second problem, this was a bit nasty to reproduce but simple in 
cause and solution. The test testCleanupOfSnapshotsInFailureCase created an 
instance of RocksDBKeyedStateBackend, but never called the dispose method to 
release native resources. Since the latest RocksDB update, their code has 
assertions in place that are executed in finalize() and will detect leaking 
native resources. This happens only when the resource is GC'ed, and this 
explains why you could only observe the test crash when it was executed with 
more tests, eventually reaching a GC.
    
    *Changes*
      - *`RocksDBAsyncSnapshotTest.testCancelFullyAsyncCheckpoints()` used a 
checkpoint stream factory that produces streams that block on latches, then 
wants to call `close()` while the stream is supposed to be blocked in an async 
thread executing and writing the rocksdb snapshot. The test fails because the 
first created stream is actually used to take a snapshot of the timer service 
into raw keyed state. The timer service is not async and therefore the test 
blocks and cannot reach `close()`. The fix introduces a different factory that 
gives a blocking stream on the second call, i.e. for the request that is 
actually from the keyed backend.*
    
      - *Introduces some missing calls to `RocksDBKeyedStateBackend::dispose()`*
    
    
    ## Verifying this change
    `RocksDBAsyncSnapshotTest.testCancelFullyAsyncCheckpoints()` is reactivated 
and works.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StefanRRichter/flink rocksdb-async-test

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4855.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4855
    
----
commit 15a4c7413f002d1ae336c076f1d82ac896f8c9f4
Author: Stefan Richter <[email protected]>
Date:   2017-10-19T05:10:21Z

    [FLINK-5372] [tests] Fix 
RocksDBAsyncSnapshotTest.testCancelFullyAsyncCheckpoints()

----


---

Reply via email to