[
https://issues.apache.org/jira/browse/HADOOP-17209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178225#comment-17178225
]
Sean Chow commented on HADOOP-17209:
------------------------------------
In order to dig this out. I've enabled jvm {{Native Memory Tracking}} , and
found out it's the jni(Java Native Interface) call issue:
{code:java}
[0x00007f2a7b5e2ea5] jni_GetIntArrayElements+0x2b5
(malloc=2444372KB +225563KB #433115199 +31527004)
[0x00007f2a7b5e2ea5] jni_GetIntArrayElements+0x2b5
[0x00007f2a4c7c35b9] getOutputs+0x79
(malloc=2457719KB +227373KB #434823615 +31758718)
[0x00007f2a7b5e2ea5] jni_GetIntArrayElements+0x2b5
[0x00007f2a4c7c34c9] getInputs+0x79
(malloc=8479302KB +618477KB #434823615 +31758718)
{code}
The attached file datanode.202137.detail_diff.5.txt is output of {{jcmd 202137
VM.native_memory detail.diff}} , that keep tracking about 24hours from baseline.
Okay it's clear now. The jni function GetIntArrayElements need to release
memory manually, according to
[https://www.eg.bucknell.edu/~mead/Java-tutorial/native1.1/implementing/array.html
|https://www.eg.bucknell.edu/~mead/Java-tutorial/native1.1/implementing/array.html]
{code:java}
Similar to the Get<type>ArrayElements functions, the JNI provides a set of
Release<type>ArrayElements functions. Do not forget to call the appropriate
Release<type>ArrayElements function, such as ReleaseIntArrayElements. If you
forget to make this call, the array stays pinned for an extended period of
time. Or, the Java Virtual Machine is unable to reclaim the memory used to
store the nonmovable copy of the array{code}
But I didn't find any function named {{ReleaseIntArrayElements .}}
{{I have a patch running in my datanode. If it's ok, patch will be attached
soon.}}
> ErasureCode native library memory leak
> --------------------------------------
>
> Key: HADOOP-17209
> URL: https://issues.apache.org/jira/browse/HADOOP-17209
> Project: Hadoop Common
> Issue Type: Bug
> Components: native
> Affects Versions: 3.3.0, 3.2.1, 3.1.3
> Reporter: Sean Chow
> Assignee: Sean Chow
> Priority: Major
> Attachments: image-2020-08-15-18-25-48-830.png
>
>
> We use both {{apache-hadoop-3.1.3}} and {{CDH-6.1.1-1.cdh6.1.1.p0.875250}}
> HDFS in production, and both of them have the memory increasing over {{-Xmx}}
> value.
> This's the jvm options:
>
> {code:java}
> -Dproc_datanode -Dhdfs.audit.logger=INFO,RFAAUDIT
> -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true
> -Xms8589934592 -Xmx8589934592 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+HeapDumpOnOutOfMemoryError ...{code}
>
> The max jvm heapsize is 8GB, but we can see the datanode RSS memory is 48g.
> {code:java}
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 226044 hdfs 20 0 50.6g 48g 4780 S 90.5 77.0 14728:27
> /usr/java/jdk1.8.0_162/bin/java -Dproc_datanode{code}
> !image-2020-08-15-17-45-27-363.png!
> !image-2020-08-15-17-50-48-598.png!
> This too much memory used leads to my machine unresponsive(if enable swap),
> or oom-killer happens.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]