[
https://issues.apache.org/jira/browse/FLINK-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997994#comment-15997994
]
ASF GitHub Bot commented on FLINK-6364:
---------------------------------------
Github user StefanRRichter commented on the issue:
https://github.com/apache/flink/pull/3801
I am sorry, but before merging I noticed that some tests (e.g.
`RocksDBStateBackendTest.testCancelRunningSnapshot`) fail sporadically (only on
Travis). I tracked the problem and I think the cause is a lack of eagerly
closing the streams in `cancel()` to interrupt blocking IO calls.
I suggest the following fix:
`RocksDBIncrementalSnapshotOperation` should have it’s own
`CloseableRegistry`. This tracks all the open streams inside the checkpointing
and is registered with the backends registry for as long as the task runs.
Then, in cancel, as a first step we can close and unregister that inner
`CloseableRegistry`. This also prevents races that the current stream gets
closed asynchronously by `cancel()`, which the checkpointing actually already
opened the next stream (the registry closes and blocks new streams on
registration once it is closed)
> Implement incremental checkpointing in RocksDBStateBackend
> ----------------------------------------------------------
>
> Key: FLINK-6364
> URL: https://issues.apache.org/jira/browse/FLINK-6364
> Project: Flink
> Issue Type: Sub-task
> Components: State Backends, Checkpointing
> Reporter: Xiaogang Shi
> Assignee: Xiaogang Shi
>
> {{RocksDBStateBackend}} is well suited for incremental checkpointing because
> RocksDB is base on LSM trees, which record updates in new sst files and all
> sst files are immutable. By only materializing those new sst files, we can
> significantly improve the performance of checkpointing.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)