pnowojski commented on a change in pull request #17135:
URL: https://github.com/apache/flink/pull/17135#discussion_r702671777



##########
File path: docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
##########
@@ -147,6 +154,11 @@ env.getCheckpointConfig.enableUnalignedCheckpoints()
 
 // sets the checkpoint storage where checkpoint snapshots will be written
 env.getCheckpointConfig.setCheckpointStorage("hdfs:///my/checkpoint/dir")
+
+// enable checkpointing with finished tasks
+val config = new Configuration()
+config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH,
 true)
+env.configure(config)

Review comment:
       What is the python way to enable this feature?

##########
File path: docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
##########
@@ -211,4 +223,40 @@ Flink currently only provides processing guarantees for 
jobs without iterations.
 
 Please note that records in flight in the loop edges (and the state changes 
associated with them) will be lost during failure.
 
+## Checkpointing with parts of the graph finished *(BETA)*
+
+Starting from Flink 1.14 it is possible to continue performing checkpoints 
even if parts of the job graph have finished processing all data, because it 
was a bounded source. The feature must be enabled
+via a feature flag:
+
+```java
+Configuration config = new Configuration();
+config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH,
 true);
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment(config);
+```
+
+Once tasks/subtasks are finished they don't contribute to the checkpoints any 
longer. It is an
+important observation that puts certain requirements on the implementation of 
any custom operators
+or UDFs. In order to support checkpointing with tasks that finish we adjusted 
the [task lifecycle]({{< ref "docs/internals/task_lifecycle" >}})
+and introduced the {{< javadoc 
file="org/apache/flink/streaming/api/operators/StreamOperator.html#finish--" 
name="StreamOperator#finish" >}}
+method. The method is expected to be a clear cutoff point for flushing any 
remaining buffered state.
+All checkpoints taken after the `finish` method has been called should not 
contain any significant state that is required
+after a restore. What does it mean in details?

Review comment:
       This and later `discarded` is a bit confusing, as the finished subtasks 
still can/should have a state pointing to side effects (2pc). It should be 
phrased that after calling `finish()` state should be empty except of pointers 
to commit side effects, like external transactions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to