sjwiesman commented on a change in pull request #10498: [FLINK-14495][docs] Add documentation for memory control of RocksDB state backend URL: https://github.com/apache/flink/pull/10498#discussion_r355601253
########## File path: docs/ops/state/large_state_tuning.md ########## @@ -210,6 +210,28 @@ and not from the JVM. Any memory you assign to RocksDB will have to be accounted of the TaskManagers by the same amount. Not doing that may result in YARN/Mesos/etc terminating the JVM processes for allocating more memory than configured. +#### Bound total memory usage of RocksDB instance(s) per slot + +RocksDB allocates native memory without control of JVM, and might lead the process to exceed total memory budget of the container to get killed in container environment (e.g. Kubernetes). +From Flink-1.10, we provide a solution to limit total memory usage for RocksDb instance(s) per slot by leveraging RocksDB's mechanism to +share [cache](https://github.com/facebook/rocksdb/wiki/Block-Cache) and [write buffer manager](https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager) among instance(s). +Generally speaking, we mainly have three parts of memory usage for RocksDB in Flink scenario: block cache, index & bloom filters and memtables +(refer to [memory-usage-in-rocksdb](https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB)). +The basic idea is to share a `Cache` object with desired capacity among all RocksDB instances, +and [cost memory used in memtable to that cache](https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager#cost-memory-used-in-memtable-to-block-cache) via write buffer manager. +Besides, we also cache index & filters into that cache, then the major use of memory would be well capped. +There exist two ways to enable this feature: Review comment: What do you think about something like this for the first paragraph? > RocksDB allocates native memory outside of the JVM, which could lead the process to exceed the total memory budget. This can be especially problematic in containerized environments such as Kubernetes that kill processes that exceed their memory budgets. Flink can limit total memory usage for each RocksDB instance per slot by leveraging shareable [caches](https://github.com/facebook/rocksdb/wiki/Block-Cache) and [write buffer manager's](https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager) among all instances in a single slot. These shared caches will place a hard upper limit on the [three components](https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB) that use the majority of memory when RocksDB is deployed as a state backend: block cache, index and bloom filters, and MemTables. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
