[ 
https://issues.apache.org/jira/browse/FLINK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905473#comment-16905473
 ] 

Mike Kaplinskiy commented on FLINK-7289:
----------------------------------------

[~aiyangar] I was testing on an unreleased build of 1.9 with a fork of frocksdb 
with the patch above. The core issue is that flink creates 1 rocksdb instance 
per subtask (i.e. shard of task). In my case, there were frequently more than 
50 separate rocksdb databases per TM. In order to limit memory across all of 
these dbs, you'll need to write code to pass the same instance of 
{{WriteBufferManager}} to all of them. It's not just a config setting.

You'll need to compile an instance of {{ConfigurableOptionsFactory}} (a class 
like the one I posted above) that uses a global {{WriteBufferManager}}. You'll 
then need to include it on the flink boot classpath - not in your application 
code classpath since that instance won't be global. You should see an 
improvement there.

If you'd like to debug native memory usage, I suggest jemalloc which can output 
native memory profiles which show allocation sites.

> Memory allocation of RocksDB can be problematic in container environments
> -------------------------------------------------------------------------
>
>                 Key: FLINK-7289
>                 URL: https://issues.apache.org/jira/browse/FLINK-7289
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>            Reporter: Stefan Richter
>            Priority: Major
>         Attachments: completeRocksdbConfig.txt
>
>
> Flink's RocksDB based state backend allocates native memory. The amount of 
> allocated memory by RocksDB is not under the control of Flink or the JVM and 
> can (theoretically) grow without limits.
> In container environments, this can be problematic because the process can 
> exceed the memory budget of the container, and the process will get killed. 
> Currently, there is no other option than trusting RocksDB to be well behaved 
> and to follow its memory configurations. However, limiting RocksDB's memory 
> usage is not as easy as setting a single limit parameter. The memory limit is 
> determined by an interplay of several configuration parameters, which is 
> almost impossible to get right for users. Even worse, multiple RocksDB 
> instances can run inside the same process and make reasoning about the 
> configuration also dependent on the Flink job.
> Some information about the memory management in RocksDB can be found here:
> https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB
> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
> We should try to figure out ways to help users in one or more of the 
> following ways:
> - Some way to autotune or calculate the RocksDB configuration.
> - Conservative default values.
> - Additional documentation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to