[ https://issues.apache.org/jira/browse/FLINK-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chesnay Schepler closed FLINK-8559. ----------------------------------- Resolution: Fixed master: dbb81acb5a1d0f2a9521c6eef7eeb2436bb8004d > Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to > get stuck > ------------------------------------------------------------------------------------- > > Key: FLINK-8559 > URL: https://issues.apache.org/jira/browse/FLINK-8559 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing, Tests > Affects Versions: 1.4.0, 1.5.0 > Reporter: Chesnay Schepler > Assignee: Chesnay Schepler > Priority: Blocker > Fix For: 1.5.0, 1.4.1 > > > In the {{RocksDBKeyedStatebackend#snapshotIncrementally}} we can find this > code > > {code:java} > final RocksDBIncrementalSnapshotOperation<K> snapshotOperation = > new RocksDBIncrementalSnapshotOperation<>( > this, > checkpointStreamFactory, > checkpointId, > checkpointTimestamp); > snapshotOperation.takeSnapshot(); > return new FutureTask<KeyedStateHandle>( > new Callable<KeyedStateHandle>() { > @Override > public KeyedStateHandle call() throws Exception { > return snapshotOperation.materializeSnapshot(); > } > } > ) { > @Override > public boolean cancel(boolean mayInterruptIfRunning) { > snapshotOperation.stop(); > return super.cancel(mayInterruptIfRunning); > } > @Override > protected void done() { > snapshotOperation.releaseResources(isCancelled()); > } > }; > {code} > In the constructor of RocksDBIncrementalSnapshotOperation we call > {{aquireResource()}} on the RocksDB {{ResourceGuard}}. If > {{snapshotOperation.takeSnapshot()}} fails with an exception these resources > are never released. When the task is shutdown due to the exception it will > get stuck on releasing RocksDB. -- This message was sent by Atlassian JIRA (v7.6.3#76005)