[
https://issues.apache.org/jira/browse/FLINK-39924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-39924:
-----------------------------------
Labels: pull-request-available (was: )
> Memory fragmentation from jemalloc misconfiguration
> ---------------------------------------------------
>
> Key: FLINK-39924
> URL: https://issues.apache.org/jira/browse/FLINK-39924
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Configuration
> Affects Versions: 2.0.2, 2.2.1, 1.20.5, 2.1.3
> Reporter: Keith Lee
> Priority: Critical
> Labels: pull-request-available
>
> We observed excessive memory fragmentation in production, using malloc_stats
> we identified the most extreme case of fragmentation at 3.91 GB (10.01 GB
> Resident - 6.1 GB Active) which was significant as the pod has a limit of 16
> GB.
> This was caused by {*}jemalloc arena count misconfigured to higher than
> expected default of 4 x number_of_cpu_cores{*}.
> h2. Why is high jemalloc arena count bad?
> Higher number of arena reduces thread contention during malloc at the cost of
> higher memory fragmentation and overall memory usage as memory freed by the
> process to jemalloc is less likely to be re-used as they are spread across
> higher number of arenas and has to go through decay of 10 seconds before
> being freed back to operating system.
> The fragmentation leaves less memory for page cache, impacting performance
> and cause higher likelihood to OOMKill.
> h2. Root cause
> Jemalloc by default configures narena using the 4 * number_of_cpu_core,
> however the value for number_of_cpu_core is obtained from the host machine
> and not from the CPU resource configured for the pod. The misconfiguration
> happens when host machine CPU core count and pod CPU resource configuration
> mismatches.
> h2. Reproduction and confirmation
> Steps to reproduce can be found here:
> [https://github.com/leekeiabstraction/flink-docker/tree/reproduce-jemalloc-fragmentation/reproduce-jemalloc-fragmentation]
> The reproduction was ran on a 16 core Mac Studio. We find on a reduction of
> 10.7 % in resident set size and a slight performance improvement when narena
> is configured correctly
> {{============================================================}}
> {{[+] Per-image summary:}}
> {{============================================================}}
> {{ image highest anon avg anon
> lowest write-recs avg write-recs}}
> {{ flink:2.2.1-scala_2.12-java17 1679.3 MiB 1522.6 MiB
> 186901 207614}}
> {{ flink-2.2.1-narenas4 1499.7 MiB 1301.9 MiB
> 200945 213198}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)