[jira] [Commented] (FLINK-19125) Avoid memory fragmentation when running flink docker image

Nico Kruber (Jira) Mon, 23 Nov 2020 05:36:34 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-19125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237371#comment-17237371
 ]


Nico Kruber commented on FLINK-19125:
-------------------------------------

Yes, probably - we should put it under some "Advanced Configuration" heading 
though since users shouldn't necessarily need to interact with this option.

Or we keep this flag undocumented in the docs but well-documented in the 
Dockerfile (we do that for some very advanced settings occasionally) but in 
that case, remove the sentence from the release notes that refers to the docs.

Your call...

> Avoid memory fragmentation when running flink docker image
> ----------------------------------------------------------
>
>                 Key: FLINK-19125
>                 URL: https://issues.apache.org/jira/browse/FLINK-19125
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes, Runtime / State Backends
>    Affects Versions: 1.12.0, 1.11.1
>            Reporter: Yun Tang
>            Assignee: Yun Tang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.12.0, 1.11.3
>
>
> This ticket tracks the problem of memory fragmentation when launching default 
> Flink docker image.
> In FLINK-18712, user reported if he submits job with rocksDB state backend on 
> a k8s session cluster again and again once it finished, the memory usage of 
> task manager grows continuously until OOM killed. 
>  I reproduce this problem with official Flink docker image no matter how we 
> use rocksDB (whether to enable managed memory or not).
> I dig into the problem and found this is due to the memory fragmentation 
> caused by {{glibc}}, which would not return memory to kernel gracefully 
> (please refer to [glibc 
> bugzilla|https://sourceware.org/bugzilla/show_bug.cgi?id=15321] and [glibc 
> manual|https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc])
> I found limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please 
> refer to 
> [choose-for-malloc_arena_max|https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max]
>  for more details).
> And if we choose to use jemalloc to allocate memory via rebuilding another 
> docker image, the problem would be gone. 
> {code:java}
> apt-get -y install libjemalloc-dev
> ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
> {code}
> Jemalloc intends to [emphasize fragmentation 
> avoidance|https://github.com/jemalloc/jemalloc/wiki/Background#intended-use] 
> and we might consider to re-factor our Dockerfile to base on jemalloc to 
> avoid memory fragmentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19125) Avoid memory fragmentation when running flink docker image

Reply via email to