[
https://issues.apache.org/jira/browse/FLINK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874305#comment-16874305
]
Mike Kaplinskiy commented on FLINK-7289:
----------------------------------------
I've been battling the RocksDB memory usage on my kubernetes setup as well - TM
containers would frequently chew through memory even in the course of a single
job. Here's something that I found works for me - though it required me to
recompile flink & frocksdb - adding a global WriteBufferManager.
https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager is very effective
in limiting total RocksDB memory usage in a TaskManager if you can pass the
same one to every RocksDB instance. Unfortunately frocksdb doesn't currently
have these exposed via JNI so I opened
https://github.com/dataArtisans/frocksdb/pull/4 to get those in there. After
that, just using a custom options factory to set write buffer manager like:
{code}
public class LadderRocksDBOptionsFactory implements ConfigurableOptionsFactory {
private static final WriteBufferManager writeBufferManager = new
WriteBufferManager(1 << 30, new LRUCache(1 << 18));
@Override
public DBOptions createDBOptions(DBOptions currentOptions) {
return currentOptions
.setMaxBackgroundJobs(4)
.setUseFsync(false)
.setMaxOpenFiles(-1)
.setWriteBufferManager(writeBufferManager);
}
@Override
public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions
currentOptions) {
return currentOptions
.setOptimizeFiltersForHits(true)
.setMaxWriteBufferNumberToMaintain(3);
}
@Override
public OptionsFactory configure(Configuration configuration) {
return this;
}
}
{code}
For one specific job I'm working on, I went from a OOM kill every 2-3 hours to
at least 12 hours of no OOM events :).
> Memory allocation of RocksDB can be problematic in container environments
> -------------------------------------------------------------------------
>
> Key: FLINK-7289
> URL: https://issues.apache.org/jira/browse/FLINK-7289
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Affects Versions: 1.2.0, 1.3.0, 1.4.0
> Reporter: Stefan Richter
> Priority: Major
>
> Flink's RocksDB based state backend allocates native memory. The amount of
> allocated memory by RocksDB is not under the control of Flink or the JVM and
> can (theoretically) grow without limits.
> In container environments, this can be problematic because the process can
> exceed the memory budget of the container, and the process will get killed.
> Currently, there is no other option than trusting RocksDB to be well behaved
> and to follow its memory configurations. However, limiting RocksDB's memory
> usage is not as easy as setting a single limit parameter. The memory limit is
> determined by an interplay of several configuration parameters, which is
> almost impossible to get right for users. Even worse, multiple RocksDB
> instances can run inside the same process and make reasoning about the
> configuration also dependent on the Flink job.
> Some information about the memory management in RocksDB can be found here:
> https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB
> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
> We should try to figure out ways to help users in one or more of the
> following ways:
> - Some way to autotune or calculate the RocksDB configuration.
> - Conservative default values.
> - Additional documentation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)