[ 
https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189138#comment-17189138
 ] 

Yun Tang commented on FLINK-18712:
----------------------------------

I use [k8s session 
cluster|https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#deploy-session-cluster]
 with flink-1.11.1 image to reproduce this problem.
The root cause is the issue of memory fragmentation with {{glibc}}. You can 
refer to 
https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc
 and https://sourceware.org/bugzilla/show_bug.cgi?id=15321 for more information.

There existed several solutions to fix this:
* Quick but not very clean solution to limit the memory pool of glibc, limit 
{{MALLOC_ARENA_MAX}} to {{2}} in the environment of k8s yaml for taskmanagers. 
   
{code:java}
        env:
        - name: MALLOC_ARENA_MAX
          value: "2"
{code}

   You could refer to 
https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max
 for more details.

* More general solution by rebuilding the image to install {{libjemalloc-dev}} 
and add the {{libjemalloc.so}} it to {{LD_PRELOAD}} in k8s yaml for 
taskmanagers.  I did not try tcmalloc, which might work as well.

I tried both of the above solutions and they worked quite well without endless 
memory usage growth.

I'll create another ticket to track the solution for this issue.
  

> Flink RocksDB statebackend memory leak issue 
> ---------------------------------------------
>
>                 Key: FLINK-18712
>                 URL: https://issues.apache.org/jira/browse/FLINK-18712
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 1.10.0
>            Reporter: Farnight
>            Priority: Critical
>             Fix For: 1.12.0
>
>
> When using RocksDB as our statebackend, we found it will lead to memory leak 
> when restarting job (manually or in recovery case).
>  
> How to reproduce:
>  # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and 
> reproduce.
>  # start a job using RocksDB statebackend.
>  # when the RocksDB blockcache reachs maximum size, restart the job. and 
> monitor the memory usage (k8s pod working set) of the TM.
>  # go through step 2-3 few more times. and memory will keep raising.
>  
> Any solution or suggestion for this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to