[jira] [Commented] (HADOOP-17209) ErasureCode native library memory leak

Sean Chow (Jira) Sat, 15 Aug 2020 03:26:08 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-17209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178225#comment-17178225
 ]


Sean Chow commented on HADOOP-17209:
------------------------------------

In order to dig this out. I've enabled jvm {{Native Memory Tracking}} , and 
found out it's the jni(Java Native Interface) call issue:

 
{code:java}
[0x00007f2a7b5e2ea5] jni_GetIntArrayElements+0x2b5
 (malloc=2444372KB +225563KB #433115199 +31527004)
[0x00007f2a7b5e2ea5] jni_GetIntArrayElements+0x2b5
[0x00007f2a4c7c35b9] getOutputs+0x79
 (malloc=2457719KB +227373KB #434823615 +31758718)
[0x00007f2a7b5e2ea5] jni_GetIntArrayElements+0x2b5
[0x00007f2a4c7c34c9] getInputs+0x79
 (malloc=8479302KB +618477KB #434823615 +31758718)


{code}
 

The attached file datanode.202137.detail_diff.5.txt is output of {{jcmd 202137 
VM.native_memory detail.diff}} , that keep tracking about 24hours from baseline.

 

Okay it's clear now. The jni function GetIntArrayElements need to release 
memory manually, according to 
[https://www.eg.bucknell.edu/~mead/Java-tutorial/native1.1/implementing/array.html
 
|https://www.eg.bucknell.edu/~mead/Java-tutorial/native1.1/implementing/array.html]
{code:java}
Similar to the Get<type>ArrayElements functions, the JNI provides a set of 
Release<type>ArrayElements functions. Do not forget to call the appropriate 
Release<type>ArrayElements function, such as ReleaseIntArrayElements. If you 
forget to make this call, the array stays pinned for an extended period of 
time. Or, the Java Virtual Machine is unable to reclaim the memory used to 
store the nonmovable copy of the array{code}
But I didn't find any function named {{ReleaseIntArrayElements .}}

{{I have a patch running in my datanode. If it's ok, patch will be attached 
soon.}}

> ErasureCode native library memory leak
> --------------------------------------
>
>                 Key: HADOOP-17209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17209
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 3.3.0, 3.2.1, 3.1.3
>            Reporter: Sean Chow
>            Assignee: Sean Chow
>            Priority: Major
>         Attachments: image-2020-08-15-18-25-48-830.png
>
>
> We use both {{apache-hadoop-3.1.3}} and {{CDH-6.1.1-1.cdh6.1.1.p0.875250}} 
> HDFS in production, and both of them have the memory increasing over {{-Xmx}} 
> value. 
> This's the jvm options:
>  
> {code:java}
> -Dproc_datanode -Dhdfs.audit.logger=INFO,RFAAUDIT 
> -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true 
> -Xms8589934592 -Xmx8589934592 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled 
> -XX:+HeapDumpOnOutOfMemoryError ...{code}
>  
> The max jvm heapsize is 8GB, but we can see the datanode RSS memory is 48g.
> {code:java}
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
> 226044 hdfs 20 0 50.6g 48g 4780 S 90.5 77.0 14728:27 
> /usr/java/jdk1.8.0_162/bin/java -Dproc_datanode{code}
> !image-2020-08-15-17-45-27-363.png!
> !image-2020-08-15-17-50-48-598.png!
> This too much memory used leads to my machine unresponsive(if enable swap), 
> or oom-killer happens.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-17209) ErasureCode native library memory leak

Reply via email to