dawidwys commented on a change in pull request #17135:
URL: https://github.com/apache/flink/pull/17135#discussion_r704343221
##########
File path: docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
##########
@@ -211,4 +223,45 @@ Flink currently only provides processing guarantees for
jobs without iterations.
Please note that records in flight in the loop edges (and the state changes
associated with them) will be lost during failure.
+## Checkpointing with parts of the graph finished *(BETA)*
+
+Starting from Flink 1.14 it is possible to continue performing checkpoints
even if parts of the job graph have finished processing all data, because it
was a bounded source. The feature must be enabled
+via a feature flag:
+
+```java
+Configuration config = new Configuration();
+config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH,
true);
+StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment(config);
+```
+
+Once tasks/subtasks are finished they don't contribute to the checkpoints any
longer. It is an
+important observation that puts certain requirements on the implementation of
any custom operators
+or UDFs. In order to support checkpointing with tasks that finish we adjusted
the [task lifecycle]({{< ref "docs/internals/task_lifecycle" >}})
+and introduced the {{< javadoc
file="org/apache/flink/streaming/api/operators/StreamOperator.html#finish--"
name="StreamOperator#finish" >}}
+method. The method is expected to be a clear cutoff point for flushing any
remaining buffered state.
+All checkpoints taken after the `finish` method has been called should be in
most cases empty and
+shouldn't contain any buffered data, as there will be no way to emit this
data. One notable
+exception is if your operator has some pointers to transactions in external
systems, for example in
+order to implement the exactly-once semantic. In such a case, checkpoints
taken after invoking `finish()`
+method should keep a pointer to the last transaction(s) that will be committed
in the final checkpoint
+before the operator is closed. A good built-in example of this are
exactly-once sinks and the
+`TwoPhaseCommitSinkFunction`. What does it mean in more details?
Review comment:
My idea is to have it there to make it less dull and catch the readers
attention again for another important section. I don't have strong attachment
to the sentence though ;)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]