Yun Tang created FLINK-19125:
--------------------------------
Summary: Avoid memory fragmentation when running flink docker image
Key: FLINK-19125
URL: https://issues.apache.org/jira/browse/FLINK-19125
Project: Flink
Issue Type: Improvement
Components: Deployment / Kubernetes, Runtime / State Backends
Affects Versions: 1.11.1
Reporter: Yun Tang
This ticket tracks the problem of memory fragmentation when launching default
Flink docker image.
In FLINK-18712, user reported if he submits job with rocksDB state backend on a
k8s session cluster again and again once it finished, the memory usage of task
manager grows continuously until OOM killed.
I reproduce this problem with official Flink docker image no matter how we use
rocksDB (whether to enable managed memory).
I dig into the problem and found this is due to the memory fragmentation caused
by {{glibc}}, which would not return memory to kernel gracefully (please refer
to [glibc bugzilla|https://sourceware.org/bugzilla/show_bug.cgi?id=15321] and
[glibc
manual|https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc])
I found if limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please
refer to
[choose-for-malloc_arena_max|https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max]
for more details).
And if we choose to use jemalloc to allocate memory via rebuilding another
docker image, the problem would be gone.
{code:java}
apt-get -y install libjemalloc-dev
ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
{code}
Jemalloc intends to [emphasize fragmentation
avoidance|https://github.com/jemalloc/jemalloc /wiki/Background#intended-use]
and we might consider to re-factor our Dockerfile to base on jemalloc to avoid
memory fragmentation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)