[GitHub] [flink] sjwiesman commented on a change in pull request #10498: [FLINK-14495][docs] Add documentation for memory control of RocksDB state backend

GitBox Mon, 09 Dec 2019 10:17:23 -0800

sjwiesman commented on a change in pull request #10498: [FLINK-14495][docs] Add 
documentation for memory control of RocksDB state backend
URL: https://github.com/apache/flink/pull/10498#discussion_r355601253

##########
File path: docs/ops/state/large_state_tuning.md
##########
@@ -210,6 +210,28 @@ and not from the JVM. Any memory you assign to RocksDB
will have to be accounted
of the TaskManagers by the same amount. Not doing that may result in
YARN/Mesos/etc terminating the JVM processes for
allocating more memory than configured.

+#### Bound total memory usage of RocksDB instance(s) per slot
+
+RocksDB allocates native memory without control of JVM, and might lead the
process to exceed total memory budget of the container to get killed in
container environment (e.g. Kubernetes).
+From Flink-1.10, we provide a solution to limit total memory usage for RocksDb
instance(s) per slot by leveraging RocksDB's mechanism to
+share [cache](https://github.com/facebook/rocksdb/wiki/Block-Cache) and [write
buffer manager](https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager)
among instance(s).
+Generally speaking, we mainly have three parts of memory usage for RocksDB in
Flink scenario: block cache, index & bloom filters and memtables
+(refer to
[memory-usage-in-rocksdb](https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB)).
+The basic idea is to share a `Cache` object with desired capacity among all
RocksDB instances,
+and [cost memory used in memtable to that
cache](https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager#cost-memory-used-in-memtable-to-block-cache)
via write buffer manager.
+Besides, we also cache index & filters into that cache, then the major use of
memory would be well capped.
+There exist two ways to enable this feature:

Review comment:
What do you think about something like this for the first paragraph?

> RocksDB allocates native memory outside of the JVM, which could lead the
process to exceed the total memory budget. This can be especially problematic
in containerized environments such as Kubernetes that kill processes that
exceed their memory budgets. Flink can limit total memory usage for each
RocksDB instance per slot by leveraging shareable
[caches](https://github.com/facebook/rocksdb/wiki/Block-Cache) and [write
buffer
manager's](https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager) among
all instances in a single slot. These shared caches will place a hard upper
limit on the [three
components](https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB)
that use the majority of memory when RocksDB is deployed as a state backend:
block cache, index and bloom filters, and MemTables.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #10498: [FLINK-14495][docs] Add documentation for memory control of RocksDB state backend

Reply via email to