[ 
https://issues.apache.org/jira/browse/FLINK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874305#comment-16874305
 ] 

Mike Kaplinskiy commented on FLINK-7289:
----------------------------------------

I've been battling the RocksDB memory usage on my kubernetes setup as well - TM 
containers would frequently chew through memory even in the course of a single 
job. Here's something that I found works for me - though it required me to 
recompile flink & frocksdb - adding a global WriteBufferManager.

https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager is very effective 
in limiting total RocksDB memory usage in a TaskManager if you can pass the 
same one to every RocksDB instance. Unfortunately frocksdb doesn't currently 
have these exposed via JNI so I opened 
https://github.com/dataArtisans/frocksdb/pull/4 to get those in there. After 
that, just using a custom options factory to set write buffer manager like:
{code}
public class LadderRocksDBOptionsFactory implements ConfigurableOptionsFactory {
    private static final WriteBufferManager writeBufferManager = new 
WriteBufferManager(1 << 30, new LRUCache(1 << 18));

    @Override
    public DBOptions createDBOptions(DBOptions currentOptions) {
        return currentOptions
                .setMaxBackgroundJobs(4)
                .setUseFsync(false)
                .setMaxOpenFiles(-1)
                .setWriteBufferManager(writeBufferManager);
    }

    @Override
    public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions 
currentOptions) {
        return currentOptions
                .setOptimizeFiltersForHits(true)
                .setMaxWriteBufferNumberToMaintain(3);
    }

    @Override
    public OptionsFactory configure(Configuration configuration) {
        return this;
    }
}
{code}

For one specific job I'm working on, I went from a OOM kill every 2-3 hours to 
at least 12 hours of no OOM events :).

> Memory allocation of RocksDB can be problematic in container environments
> -------------------------------------------------------------------------
>
>                 Key: FLINK-7289
>                 URL: https://issues.apache.org/jira/browse/FLINK-7289
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>            Reporter: Stefan Richter
>            Priority: Major
>
> Flink's RocksDB based state backend allocates native memory. The amount of 
> allocated memory by RocksDB is not under the control of Flink or the JVM and 
> can (theoretically) grow without limits.
> In container environments, this can be problematic because the process can 
> exceed the memory budget of the container, and the process will get killed. 
> Currently, there is no other option than trusting RocksDB to be well behaved 
> and to follow its memory configurations. However, limiting RocksDB's memory 
> usage is not as easy as setting a single limit parameter. The memory limit is 
> determined by an interplay of several configuration parameters, which is 
> almost impossible to get right for users. Even worse, multiple RocksDB 
> instances can run inside the same process and make reasoning about the 
> configuration also dependent on the Flink job.
> Some information about the memory management in RocksDB can be found here:
> https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB
> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
> We should try to figure out ways to help users in one or more of the 
> following ways:
> - Some way to autotune or calculate the RocksDB configuration.
> - Conservative default values.
> - Additional documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to