[
https://issues.apache.org/jira/browse/FLINK-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chesnay Schepler updated FLINK-8559:
------------------------------------
Affects Version/s: 1.4.0
> Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to
> get stuck
> -------------------------------------------------------------------------------------
>
> Key: FLINK-8559
> URL: https://issues.apache.org/jira/browse/FLINK-8559
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing, Tests
> Affects Versions: 1.4.0, 1.5.0
> Reporter: Chesnay Schepler
> Priority: Blocker
>
> In the {{RocksDBKeyedStatebackend#snapshotIncrementally}} we can find this
> code
>
> {code:java}
> final RocksDBIncrementalSnapshotOperation<K> snapshotOperation =
> new RocksDBIncrementalSnapshotOperation<>(
> this,
> checkpointStreamFactory,
> checkpointId,
> checkpointTimestamp);
> snapshotOperation.takeSnapshot();
> return new FutureTask<KeyedStateHandle>(
> new Callable<KeyedStateHandle>() {
> @Override
> public KeyedStateHandle call() throws Exception {
> return snapshotOperation.materializeSnapshot();
> }
> }
> ) {
> @Override
> public boolean cancel(boolean mayInterruptIfRunning) {
> snapshotOperation.stop();
> return super.cancel(mayInterruptIfRunning);
> }
> @Override
> protected void done() {
> snapshotOperation.releaseResources(isCancelled());
> }
> };
> {code}
> In the constructor of RocksDBIncrementalSnapshotOperation we call
> {{aquireResource()}} on the RocksDB {{ResourceGuard}}. If
> {{snapshotOperation.takeSnapshot()}} fails with an exception these resources
> are never released. When the task is shutdown due to the exception it will
> get stuck on releasing RocksDB.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)