pnowojski commented on a change in pull request #17135:
URL: https://github.com/apache/flink/pull/17135#discussion_r704100504
##########
File path: docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
##########
@@ -239,8 +239,9 @@ important observation that puts certain requirements on the
implementation of an
or UDFs. In order to support checkpointing with tasks that finish we adjusted
the [task lifecycle]({{< ref "docs/internals/task_lifecycle" >}})
and introduced the {{< javadoc
file="org/apache/flink/streaming/api/operators/StreamOperator.html#finish--"
name="StreamOperator#finish" >}}
method. The method is expected to be a clear cutoff point for flushing any
remaining buffered state.
-All checkpoints taken after the `finish` method has been called should not
contain any significant state that is required
-after a restore. What does it mean in details?
+All checkpoints taken after the `finish` method has been called can contain
only pointers to the last transactions
+that will be closed in the final checkpoint or in case of a failure should be
closed when restoring.
+Apart from those pointers it should be empty. What does it mean in details?
Review comment:
```suggestion
All checkpoints taken after the `finish` method has been called should be in
most cases empty and
shouldn't contain any buffered data, as there will be no way to emit this
data. One noteable
exception is if your operator has some pointers to transactions in external
systems, for example in
order to implement exactly-once semantic. In such case checkpoints taken
after invoking `finish()`
method should keep pointer to last transaction(s) that will be committed in
the final checkpoint
before operator is closed. Good built-in example of this are exactly-once
sinks and {{TwoPhaseCommitSinkFunction}}.
What does it mean in more details?
```
##########
File path: docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
##########
@@ -239,8 +239,9 @@ important observation that puts certain requirements on the
implementation of an
or UDFs. In order to support checkpointing with tasks that finish we adjusted
the [task lifecycle]({{< ref "docs/internals/task_lifecycle" >}})
and introduced the {{< javadoc
file="org/apache/flink/streaming/api/operators/StreamOperator.html#finish--"
name="StreamOperator#finish" >}}
method. The method is expected to be a clear cutoff point for flushing any
remaining buffered state.
-All checkpoints taken after the `finish` method has been called should not
contain any significant state that is required
-after a restore. What does it mean in details?
+All checkpoints taken after the `finish` method has been called can contain
only pointers to the last transactions
+that will be closed in the final checkpoint or in case of a failure should be
closed when restoring.
+Apart from those pointers it should be empty. What does it mean in details?
Review comment:
And what do you mean by:
> or in case of a failure should be closed when restoring.
? close -> committed? or aborted? I guess you mean committing when failure
happens before `notifyCheckpointComplete()` reached the task? In such case I
would skip this as too much details.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]