pnowojski commented on a change in pull request #17135:
URL: https://github.com/apache/flink/pull/17135#discussion_r704100504



##########
File path: docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
##########
@@ -239,8 +239,9 @@ important observation that puts certain requirements on the 
implementation of an
 or UDFs. In order to support checkpointing with tasks that finish we adjusted 
the [task lifecycle]({{< ref "docs/internals/task_lifecycle" >}})
 and introduced the {{< javadoc 
file="org/apache/flink/streaming/api/operators/StreamOperator.html#finish--" 
name="StreamOperator#finish" >}}
 method. The method is expected to be a clear cutoff point for flushing any 
remaining buffered state.
-All checkpoints taken after the `finish` method has been called should not 
contain any significant state that is required
-after a restore. What does it mean in details?
+All checkpoints taken after the `finish` method has been called can contain 
only pointers to the last transactions
+that will be closed in the final checkpoint or in case of a failure should be 
closed when restoring.
+Apart from those pointers it should be empty. What does it mean in details?

Review comment:
       ```suggestion
   All checkpoints taken after the `finish` method has been called should be in 
most cases empty and
   shouldn't contain any buffered data, as there will be no way to emit this 
data. One noteable 
   exception is if your operator has some pointers to transactions in external 
systems, for example in 
   order to implement exactly-once semantic. In such case checkpoints taken 
after invoking `finish()`
   method should keep pointer to last transaction(s) that will be committed in 
the final checkpoint
   before operator is closed. Good built-in example of this are exactly-once 
sinks and {{TwoPhaseCommitSinkFunction}}.
   What does it mean in more details?
   ```

##########
File path: docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
##########
@@ -239,8 +239,9 @@ important observation that puts certain requirements on the 
implementation of an
 or UDFs. In order to support checkpointing with tasks that finish we adjusted 
the [task lifecycle]({{< ref "docs/internals/task_lifecycle" >}})
 and introduced the {{< javadoc 
file="org/apache/flink/streaming/api/operators/StreamOperator.html#finish--" 
name="StreamOperator#finish" >}}
 method. The method is expected to be a clear cutoff point for flushing any 
remaining buffered state.
-All checkpoints taken after the `finish` method has been called should not 
contain any significant state that is required
-after a restore. What does it mean in details?
+All checkpoints taken after the `finish` method has been called can contain 
only pointers to the last transactions
+that will be closed in the final checkpoint or in case of a failure should be 
closed when restoring.
+Apart from those pointers it should be empty. What does it mean in details?

Review comment:
       
   And what do you mean by:
   > or in case of a failure should be closed when restoring.
   
   ? close -> committed? or aborted? I guess you mean committing when failure 
happens before `notifyCheckpointComplete()` reached the task? In such case I 
would skip this as too much details.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to