[ 
https://issues.apache.org/jira/browse/FLINK-19125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated FLINK-19125:
--------------------------
    Description: 
This ticket tracks the problem of memory fragmentation when launching default 
Flink docker image.

In FLINK-18712, user reported if he submits job with rocksDB state backend on a 
k8s session cluster again and again once it finished, the memory usage of task 
manager grows continuously until OOM killed. 
 I reproduce this problem with official Flink docker image no matter how we use 
rocksDB (whether to enable managed memory or not).

I dig into the problem and found this is due to the memory fragmentation caused 
by {{glibc}}, which would not return memory to kernel gracefully (please refer 
to [glibc bugzilla|https://sourceware.org/bugzilla/show_bug.cgi?id=15321] and 
[glibc 
manual|https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc])

I found limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please 
refer to 
[choose-for-malloc_arena_max|https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max]
 for more details).

And if we choose to use jemalloc to allocate memory via rebuilding another 
docker image, the problem would be gone. 

{code:java}
apt-get -y install libjemalloc-dev

ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
{code}

Jemalloc intends to [emphasize fragmentation 
avoidance|https://github.com/jemalloc/jemalloc/wiki/Background#intended-use] 
and we might consider to re-factor our Dockerfile to base on jemalloc to avoid 
memory fragmentation.

  was:
This ticket tracks the problem of memory fragmentation when launching default 
Flink docker image.

In FLINK-18712, user reported if he submits job with rocksDB state backend on a 
k8s session cluster again and again once it finished, the memory usage of task 
manager grows continuously until OOM killed. 
 I reproduce this problem with official Flink docker image no matter how we use 
rocksDB (whether to enable managed memory).

I dig into the problem and found this is due to the memory fragmentation caused 
by {{glibc}}, which would not return memory to kernel gracefully (please refer 
to [glibc bugzilla|https://sourceware.org/bugzilla/show_bug.cgi?id=15321] and 
[glibc 
manual|https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc])

I found if limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please 
refer to 
[choose-for-malloc_arena_max|https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max]
 for more details).

And if we choose to use jemalloc to allocate memory via rebuilding another 
docker image, the problem would be gone. 

{code:java}
apt-get -y install libjemalloc-dev

ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
{code}

Jemalloc intends to [emphasize fragmentation 
avoidance|https://github.com/jemalloc/jemalloc /wiki/Background#intended-use] 
and we might consider to re-factor our Dockerfile to base on jemalloc to avoid 
memory fragmentation.


> Avoid memory fragmentation when running flink docker image
> ----------------------------------------------------------
>
>                 Key: FLINK-19125
>                 URL: https://issues.apache.org/jira/browse/FLINK-19125
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes, Runtime / State Backends
>    Affects Versions: 1.12.0, 1.11.1
>            Reporter: Yun Tang
>            Assignee: Yun Tang
>            Priority: Major
>             Fix For: 1.12.0, 1.11.3
>
>
> This ticket tracks the problem of memory fragmentation when launching default 
> Flink docker image.
> In FLINK-18712, user reported if he submits job with rocksDB state backend on 
> a k8s session cluster again and again once it finished, the memory usage of 
> task manager grows continuously until OOM killed. 
>  I reproduce this problem with official Flink docker image no matter how we 
> use rocksDB (whether to enable managed memory or not).
> I dig into the problem and found this is due to the memory fragmentation 
> caused by {{glibc}}, which would not return memory to kernel gracefully 
> (please refer to [glibc 
> bugzilla|https://sourceware.org/bugzilla/show_bug.cgi?id=15321] and [glibc 
> manual|https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc])
> I found limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please 
> refer to 
> [choose-for-malloc_arena_max|https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max]
>  for more details).
> And if we choose to use jemalloc to allocate memory via rebuilding another 
> docker image, the problem would be gone. 
> {code:java}
> apt-get -y install libjemalloc-dev
> ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
> {code}
> Jemalloc intends to [emphasize fragmentation 
> avoidance|https://github.com/jemalloc/jemalloc/wiki/Background#intended-use] 
> and we might consider to re-factor our Dockerfile to base on jemalloc to 
> avoid memory fragmentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to