[ 
https://issues.apache.org/jira/browse/FLINK-32577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yunhong Zheng updated FLINK-32577:
----------------------------------
    Description: This issue is a sub-issue of FLINK-18356.  (was: This issue is 
a sub-issue of FLINK-18356.

When I run mvn verify for flink table-planner in azure CI and my own machine.  
I found that the heap memory and non-heap memory of JVM are stable and within 
the normal range. However, the total memory usage ({*}RES{*}) of the fork 
process is very high, as shown in the following figure(PID : 2958793 and 
2958794):

!image-2023-07-11-19-28-52-851.png|width=537,height=245!

I try to delve deeper into the specific memory allocation of these two 
processes:

 
{code:java}
pmap -p 2958793 {code}
I found that there are a lot of memory fragmentation here with a size close to 
*64MB* (>200 memory fragmentation):

 

!image-2023-07-11-19-35-54-530.png|width=237,height=413!

Based on past experience, this issue is likely to trigger the classic problem 
of the incorrect memory fragmentation manage by *glibc of JDK8.* So we 
downloaded *libjemalloc* and added the environment variable:

 
{code:java}
export LD_PRELOAD=${JAVA_HOME}/lib/amd64/libjemalloc.so.2{code}
After that, the overall memory of the fork process has become stable and meets 
expectations (5GB):

 

!image-2023-07-11-19-41-18-626.png|width=488,height=208!

!image-2023-07-11-19-41-37-105.png|width=228,height=287!

The solution to this problem requires modifying the CI execution Docker image 
[Docker image|[https://github.com/flink-ci/flink-ci-docker],]  replacing 
*glibc* with *libjemalloc* like FLINK-19125, cc [~chesnay] :{*}{{*}}
{code:java}
apt-get -y install libjemalloc-dev

ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so {code}
I have opened a new Jira (FLINK-32577) to track and fix this issue. cc 
[~mapohl]  [~jark].

 )

> Avoid memory fragmentation when running CI for flink-table-planner module
> -------------------------------------------------------------------------
>
>                 Key: FLINK-32577
>                 URL: https://issues.apache.org/jira/browse/FLINK-32577
>             Project: Flink
>          Issue Type: Improvement
>          Components: Build System / CI, Table SQL / Planner
>    Affects Versions: 1.18.0
>            Reporter: Yunhong Zheng
>            Priority: Major
>             Fix For: 1.18.0
>
>
> This issue is a sub-issue of FLINK-18356.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to