Github user StefanRRichter commented on the issue:
https://github.com/apache/flink/pull/3801
I am sorry, but before merging I noticed that some tests (e.g.
`RocksDBStateBackendTest.testCancelRunningSnapshot`) fail sporadically (only on
Travis). I tracked the problem and I think the cause is a lack of eagerly
closing the streams in `cancel()` to interrupt blocking IO calls.
I suggest the following fix:
`RocksDBIncrementalSnapshotOperation` should have itâs own
`CloseableRegistry`. This tracks all the open streams inside the checkpointing
and is registered with the backends registry for as long as the task runs.
Then, in cancel, as a first step we can close and unregister that inner
`CloseableRegistry`. This also prevents races that the current stream gets
closed asynchronously by `cancel()`, which the checkpointing actually already
opened the next stream (the registry closes and blocks new streams on
registration once it is closed)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---