[ 
https://issues.apache.org/jira/browse/FLINK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106133#comment-16106133
 ] 

Vinay commented on FLINK-7289:
------------------------------

Hi Stefan,

I have mainly used RocksDB on EMR backed up by SSD's and 122GB memory. 
Although  FLASH_SSD_OPTION is good, it does not provide control over the amount 
of memory to be used. So I had tuned some parameters with the below 
configurations :

{code:java}

DBOptions:
     (along with the FLASH_SSD_OPTIONS add the following)
     maxBackgroundCompactions(4)
    
ColumnFamilyOptions:
  max_buffer_size : 512 MB
  block_cache_size : 128 MB
  max_write_buffer_number : 5
  minimum_buffer_number_to_merge : 2
  cacheIndexAndFilterBlocks : true
  optimizeFilterForHits: true
{code}

According to the documentation when {code:java}  optimizeFilterForHits: true 
{code} is set, RocksDB will not build bloom filters on the last level which 
contains 90% of DB. Thus the memory usage for bloom filters will be 10x less.

As RocksDB uses a lot of memory , if we cancel the job in between the memory 
used is not reclaimed. For Example: assuming that the job is running for 1 hour 
and the memory used is 50GB , now when we cancel the job from UI the memory is 
not reclaimed.
I have observed this case when I had run the job on YARN.

I order to reclaim the memory I had to manually run the following command on 
each node of EMR:
{code:java}
sync; echo 3 > /proc/sys/vm/drop_caches
sync; echo 2 > /proc/sys/vm/drop_caches
sync; echo 1 > /proc/sys/vm/drop_caches
{code}


> Memory allocation of RocksDB can be problematic in container environments
> -------------------------------------------------------------------------
>
>                 Key: FLINK-7289
>                 URL: https://issues.apache.org/jira/browse/FLINK-7289
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>            Reporter: Stefan Richter
>
> Flink's RocksDB based state backend allocates native memory. The amount of 
> allocated memory by RocksDB is not under the control of Flink or the JVM and 
> can (theoretically) grow without limits.
> In container environments, this can be problematic because the process can 
> exceed the memory budget of the container, and the process will get killed. 
> Currently, there is no other option than trusting RocksDB to be well behaved 
> and to follow its memory configurations. However, limiting RocksDB's memory 
> usage is not as easy as setting a single limit parameter. The memory limit is 
> determined by an interplay of several configuration parameters, which is 
> almost impossible to get right for users. Even worse, multiple RocksDB 
> instances can run inside the same process and make reasoning about the 
> configuration also dependent on the Flink job.
> Some information about the memory management in RocksDB can be found here:
> https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB
> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
> We should try to figure out ways to help users in one or more of the 
> following ways:
> - Some way to autotune or calculate the RocksDB configuration.
> - Conservative default values.
> - Additional documentation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to