[ https://issues.apache.org/jira/browse/FLINK-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628529#comment-15628529 ]
ASF GitHub Bot commented on FLINK-4939: --------------------------------------- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/2707#discussion_r86109500 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/GenericWriteAheadSink.java --- @@ -142,46 +184,62 @@ private void cleanState() throws Exception { public void notifyOfCompletedCheckpoint(long checkpointId) throws Exception { super.notifyOfCompletedCheckpoint(checkpointId); - synchronized (state.pendingHandles) { - Set<Long> pastCheckpointIds = state.pendingHandles.keySet(); - Set<Long> checkpointsToRemove = new HashSet<>(); - for (Long pastCheckpointId : pastCheckpointIds) { + synchronized (pendingHandles) { + Set<PendingCheckpointId> checkpointsToRemove = new HashSet<>(); + + for (Map.Entry<PendingCheckpointId, PendingHandle> pendingHandle : pendingHandles.entrySet()) { + + PendingCheckpointId pendingId = pendingHandle.getKey(); + long pastCheckpointId = pendingId.checkpointId; + int subtaskId = pendingId.subtaskId; + if (pastCheckpointId <= checkpointId) { + + PendingHandle handle = pendingHandle.getValue(); + long timestamp = handle.timestamp; + StreamStateHandle streamHandle = handle.stateHandle; + try { - if (!committer.isCheckpointCommitted(pastCheckpointId)) { - Tuple2<Long, StreamStateHandle> handle = state.pendingHandles.get(pastCheckpointId); - try (FSDataInputStream in = handle.f1.openInputStream()) { + if (!committer.isCheckpointCommitted(subtaskId, pastCheckpointId)) { + try (FSDataInputStream in = streamHandle.openInputStream()) { boolean success = sendValues( new ReusingMutableToRegularIteratorWrapper<>( new InputViewIterator<>( new DataInputViewStreamWrapper( in), serializer), serializer), - handle.f0); - if (success) { //if the sending has failed we will retry on the next notify - committer.commitCheckpoint(pastCheckpointId); - checkpointsToRemove.add(pastCheckpointId); + timestamp); + if (success) { + // in case the checkpoint was successfully committed, discard its state from the + // backend and mark it for removal + // in case it failed, we retry on the next checkpoint + --- End diff -- remove empty line > GenericWriteAheadSink: Decouple the creating from the committing subtask for > a pending checkpoint > ------------------------------------------------------------------------------------------------- > > Key: FLINK-4939 > URL: https://issues.apache.org/jira/browse/FLINK-4939 > Project: Flink > Issue Type: Improvement > Components: Cassandra Connector > Reporter: Kostas Kloudas > Assignee: Kostas Kloudas > Fix For: 1.2.0 > > > So far the GenericWriteAheadSink expected that > the subtask that wrote a pending checkpoint to the > state backend, will be also the one to commit it to > the third-party storage system. > This issue targets at removing this assumption. To do this > the CheckpointCommitter has to be able to dynamically > take the subtaskIdx as a parameter when asking > if a checkpoint was committed and also change the > state kept by the GenericWriteAheadSink to also > include that subtask index of the subtask that wrote > the pending checkpoint. > This change is also necessary for making the operator rescalable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)