[ https://issues.apache.org/jira/browse/FLINK-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004100#comment-16004100 ]
ramkrishna.s.vasudevan commented on FLINK-5960: ----------------------------------------------- I can work on this too, if no one else is already working on this. > Make CheckpointCoordinator less blocking > ---------------------------------------- > > Key: FLINK-5960 > URL: https://issues.apache.org/jira/browse/FLINK-5960 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing > Affects Versions: 1.2.0, 1.3.0 > Reporter: Till Rohrmann > > Currently the {{CheckpointCoordinator}} locks its operation under a global > lock. This also includes writing checkpoint data out to a state storage. If > this operation blocks, then the whole checkpoint operator stands still. I > think we should rework the {{CheckpointCoordinator}} to make fewer > assumptions about external systems to tolerate write failures and timeouts. > Furthermore, we should try to limit the scope of locking and the execution of > potentially blocking operation under the lock. This will improve the runtime > behaviour of the {{CheckpointCoordinator}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)