carp84 commented on a change in pull request #10987: [FLINK-14495][docs]
(EXTENDED) Add documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10987#discussion_r373759898
##########
File path: docs/ops/state/state_backends.md
##########
@@ -186,8 +188,145 @@ state.backend: filesystem
state.checkpoints.dir: hdfs://namenode:40010/flink/checkpoints
{% endhighlight %}
-#### RocksDB State Backend Config Options
-{% include generated/rocks_db_configuration.html %}
+# RocksDB State Backend Details
+
+*This section describes the RocksDB state backend in more detail.*
+
+### Incremental Checkpoints
+
+RocksDB supports *Incremental Checkpoints*, which can dramatically reduce the
checkpointing time in comparison to full checkpoints.
+Instead of producing a full, self-contained backup of the state backend,
incremental checkpoints only record the changes that happened since the latest
completed checkpoint.
+
+An incremental checkpoint builds upon (typically multiple) previous
checkpoints. Flink leverages RocksDB's internal compaction mechanism in a way
that is self-consolidating over time. As a result, the incremental checkpoint
history in Flink does not grow indefinitely, and old checkpoints are eventually
subsumed and pruned automatically.
+
+Recovery time of incremental checkpoints may be longer or shorter comapared to
full checkpoints. If your network bandwidth is the bottleneck, it may take a
bit longer to restore from an incremental checkpoint, because it implies
fetching more data (more deltas). Restoring from an incremental checkpoint is
faster, if the bottleneck is your CPU or IOPs, because restoring from an
incremental checkpoint means not re-building the local RocksDB tables from
Flink's canonical key/value snapshot format(used in savepoints and full
checkpoints).
+
+While we encourage the use of incremental checkpoints for large state, you
need to enable this feature manually:
+ - Setting a default in your `flink-conf.yaml`: `state.backend.incremental:
true`
+ - Configuring this in code (overrides the config default):
`RocksDBStateBackend backend = new RocksDBStateBackend(filebackend, true);`
+
+### Memory Management
+
+Flink aims to control the total process memory consumption to make sure that
the Flink TaskManagers have a well-behaved memory footprint. That means staying
within the limits enforced by the environment (Docker/Kubernetes, Yarn, etc) to
not get killed for consuming too much memory, but also to not under-utilize
memory (unnecessary spilling to disk, wasted caching opportunities, reduced
performance).
+
+To achieve that, Flink by default configures RocksDB's memory allocation to
the amount of managed memory of the TaskManager (or, more precisely, task
slot). This should give good out-of-the-box experience for most applications,
meaning most applications should not need to tune any of the detailed RocksDB
settings. The primary mechanism for improving memory-related performance issues
would be to simply increase Flink's managed memory.
+
+Uses can choose to deactivate that feature and let RocksDB allocate memory
independently per ColumnFamily (one per state per operator). This offers expert
users ultimately more fine grained control over RocksDB, but means that users
need to take care themselves that the overall memory consumption does not
exceed the limits of the environment. See [large state tuning]({{ site.baseurl
}}/ops/state/large_state_tuning.html#tuning-rocksdb-memory) for some guideline
about large state performance tuning.
Review comment:
Uses -> Users, at the very beginning.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services