[jira] [Commented] (FLINK-7449) Improve and enhance documentation for incremental checkpoints

ASF GitHub Bot (JIRA) Tue, 19 Sep 2017 06:34:49 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171693#comment-16171693
 ]


ASF GitHub Bot commented on FLINK-7449:
---------------------------------------

Github user alpinegizmo commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4543#discussion_r139684257
  
    --- Diff: docs/ops/state/checkpoints.md ---
    @@ -99,3 +99,296 @@ above).
     ```sh
     $ bin/flink run -s :checkpointMetaDataPath [:runArgs]
     ```
    +
    +## Incremental Checkpoints
    +
    +### Synopsis
    +
    +Incremental checkpoints can significantly reduce checkpointing time in 
comparison to full checkpoints, at the cost of a
    +(potentially) longer recovery time. The core idea is that incremental 
checkpoints only record changes in state since the
    +previously-completed checkpoint instead of producing a full, 
self-contained backup of the state backend. In this way,
    +incremental checkpoints can build upon previous checkpoints.
    +
    +RocksDBStateBackend is currently the only backend that supports 
incremental checkpoints.
    +
    +Flink leverages RocksDB's internal backup mechanism in a way that is 
self-consolidating over time. As a result, the
    +incremental checkpoint history in Flink does not grow indefinitely, and 
old checkpoints are eventually subsumed and
    +pruned automatically.
    +
    +``While we strongly encourage the use of incremental checkpoints for Flink 
jobs with large state, please note that this is
    +a new feature and currently not enabled by default``.
    +
    +To enable this feature, users can instantiate a `RocksDBStateBackend` with 
the corresponding boolean flag in the
    +constructor set to `true`, e.g.:
    +
    +```java
    +   RocksDBStateBackend backend =
    +       new RocksDBStateBackend(filebackend, true);
    +```
    +
    +### Use-case for Incremental Checkpoints
    +
    +Checkpoints are the centrepiece of Flink’s fault tolerance mechanism and 
each checkpoint represents a consistent
    +snapshot of the distributed state of a Flink job from which the system can 
recover in case of a software or machine
    +failure (see [here]({{ site.baseurl 
}}/internals/stream_checkpointing.html). 
    +
    +Flink creates checkpoints periodically to track the progress of a job so 
that, in case of failure, only those
    +(hopefully few) *events that have been processed after the last completed 
checkpoint* must be reprocessed from the data
    +source. The number of events that must be reprocessed has implications for 
recovery time, and so for fastest recovery,
    +we want to *take checkpoints as often as possible*.
    +
    +However, checkpoints are not without performance cost and can introduce 
*considerable overhead* to the system. This
    +overhead can lead to lower throughput and higher latency during the time 
that checkpoints are created. One reason is
    +that, traditionally, each checkpoint in Flink always represented the 
*complete state* of the job at the time of the
    +checkpoint, and all of the state had to be written to stable storage 
(typically some distributed file system) for every
    +single checkpoint. Writing multiple terabytes (or more) of state data for 
each checkpoint can obviously create
    +significant load for the I/O and network subsystems, on top of the normal 
load from pipeline’s data processing work.
    --- End diff --
    
    on top of the normal load from the pipeline’s data processing work.
    
    (add "the")


> Improve and enhance documentation for incremental checkpoints
> -------------------------------------------------------------
>
>                 Key: FLINK-7449
>                 URL: https://issues.apache.org/jira/browse/FLINK-7449
>             Project: Flink
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 1.4.0
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>            Priority: Minor
>
> We should provide more details about incremental checkpoints in the 
> documentation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7449) Improve and enhance documentation for incremental checkpoints

Reply via email to