[GitHub] [flink] gaoyunhaii commented on a change in pull request #17135: [FLINK-24098] Document FLIP-147 capabiliites and limitations

GitBox Wed, 08 Sep 2021 01:07:56 -0700


gaoyunhaii commented on a change in pull request #17135:
URL: https://github.com/apache/flink/pull/17135#discussion_r704116077




##########
File path: docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
##########
@@ -83,6 +83,8 @@ Other parameters for checkpointing include:
 
   - *unaligned checkpoints*: You can enable [unaligned checkpoints]({{< ref 
"docs/ops/state/unaligned_checkpoints" >}}) to greatly reduce checkpointing 
times under backpressure. Only works for exactly-once checkpoints and with 
number of concurrent checkpoints of 1.
 
+  - *checkpoints with finished tasks*: You can enable an experimental feature 
to continue performing checkpoints even if parts of the DAG have finished 
processing all of their records. Before doing so please read through the 
[important considerations](#checkpointing-with-parts-of-the-graph-finished)

Review comment:
       dot missed at the end of line.

##########
File path: docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
##########
@@ -211,4 +223,41 @@ Flink currently only provides processing guarantees for 
jobs without iterations.
 
 Please note that records in flight in the loop edges (and the state changes 
associated with them) will be lost during failure.
 
+## Checkpointing with parts of the graph finished *(BETA)*
+
+Starting from Flink 1.14 it is possible to continue performing checkpoints 
even if parts of the job graph have finished processing all data, because it 
was a bounded source. The feature must be enabled

Review comment:
       Perhaps `it was a bounded source` -> `all the sources are bounded`  or 
`it used only bounded sources` ? 

##########
File path: docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
##########
@@ -211,4 +223,45 @@ Flink currently only provides processing guarantees for 
jobs without iterations.
 
 Please note that records in flight in the loop edges (and the state changes 
associated with them) will be lost during failure.
 
+## Checkpointing with parts of the graph finished *(BETA)*
+
+Starting from Flink 1.14 it is possible to continue performing checkpoints 
even if parts of the job graph have finished processing all data, because it 
was a bounded source. The feature must be enabled
+via a feature flag:
+
+```java
+Configuration config = new Configuration();
+config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH,
 true);
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment(config);
+```
+
+Once tasks/subtasks are finished they don't contribute to the checkpoints any 
longer. It is an
+important observation that puts certain requirements on the implementation of 
any custom operators
+or UDFs. In order to support checkpointing with tasks that finish we adjusted 
the [task lifecycle]({{< ref "docs/internals/task_lifecycle" >}})
+and introduced the {{< javadoc 
file="org/apache/flink/streaming/api/operators/StreamOperator.html#finish--" 
name="StreamOperator#finish" >}}
+method. The method is expected to be a clear cutoff point for flushing any 
remaining buffered state.
+All checkpoints taken after the `finish` method has been called should be in 
most cases empty and
+shouldn't contain any buffered data, as there will be no way to emit this 
data. One notable
+exception is if your operator has some pointers to transactions in external 
systems, for example in
+order to implement the exactly-once semantic. In such a case, checkpoints 
taken after invoking `finish()`
+method should keep a pointer to the last transaction(s) that will be committed 
in the final checkpoint
+before the operator is closed. A good built-in example of this are 
exactly-once sinks and the 
+`TwoPhaseCommitSinkFunction`. What does it mean in more details?

Review comment:
       Do we need to remove the "What does it mean in more details?"  ? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] gaoyunhaii commented on a change in pull request #17135: [FLINK-24098] Document FLIP-147 capabiliites and limitations

Reply via email to