[
https://issues.apache.org/jira/browse/HADOOP-17209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181071#comment-17181071
]
Stephen O'Donnell commented on HADOOP-17209:
--------------------------------------------
>From the tutorial posted here:
http://www.iitk.ac.in/esc101/05Aug/tutorial/native1.1/implementing/array.html
It does indeed seem that you must call ReleaseIntArrayElements each time you
call GetIntArrayElements, so the change makes sense to me. However, I have
never used JNI, so my knowledge in this area is very small.
Grepping the code for GetIntArrayElements, I see there are 3 occurrences of
this currently:
{code}
$ pwd
/Users/sodonnell/source/upstream_hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/erasurecode
$ grep GetIntArrayElements *.c
jni_common.c: tmpInputOffsets = (int*)(*env)->GetIntArrayElements(env,
jni_common.c: tmpOutputOffsets = (int*)(*env)->GetIntArrayElements(env,
jni_rs_decoder.c: int* tmpErasedIndexes =
(int*)(*env)->GetIntArrayElements(env,
{code}
This patch address 2 of them - in jni_common.c in the function getOutputs, do
we need a call to ReleaseIntArrayElements there too?
{code}
void getOutputs(JNIEnv *env, jobjectArray outputs, jintArray outputOffsets,
unsigned char** destOutputs, int num) {
int numOutputs = (*env)->GetArrayLength(env, outputs);
int i, *tmpOutputOffsets;
jobject byteBuffer;
if (numOutputs != num) {
THROW(env, "java/lang/InternalError", "Invalid outputs");
}
tmpOutputOffsets = (int*)(*env)->GetIntArrayElements(env,
outputOffsets, NULL);
for (i = 0; i < numOutputs; i++) {
byteBuffer = (*env)->GetObjectArrayElement(env, outputs, i);
destOutputs[i] = (unsigned char *)((*env)->GetDirectBufferAddress(env,
byteBuffer));
destOutputs[i] += tmpOutputOffsets[i];
}
}
{code}
[~seanlook] Have you been running with this patch in production for some time,
and all EC operations are working fine with it?
> Erasure Coding: Native library memory leak
> ------------------------------------------
>
> Key: HADOOP-17209
> URL: https://issues.apache.org/jira/browse/HADOOP-17209
> Project: Hadoop Common
> Issue Type: Bug
> Components: native
> Affects Versions: 3.3.0, 3.2.1, 3.1.3
> Reporter: Sean Chow
> Assignee: Sean Chow
> Priority: Major
> Attachments: HADOOP-17209.001.patch,
> datanode.202137.detail_diff.5.txt, image-2020-08-15-18-26-44-744.png,
> image-2020-08-20-12-35-39-906.png
>
>
> We use both {{apache-hadoop-3.1.3}} and {{CDH-6.1.1-1.cdh6.1.1.p0.875250}}
> HDFS in production, and both of them have the memory increasing over {{-Xmx}}
> value.
> !image-2020-08-15-18-26-44-744.png!
>
> We use EC strategy to to save storage costs.
> This's the jvm options:
> {code:java}
> -Dproc_datanode -Dhdfs.audit.logger=INFO,RFAAUDIT
> -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true
> -Xms8589934592 -Xmx8589934592 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+HeapDumpOnOutOfMemoryError ...{code}
> The max jvm heapsize is 8GB, but we can see the datanode RSS memory is 48g.
> All the other datanodes in this hdfs cluster has the same issue.
> {code:java}
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 226044 hdfs 20 0 50.6g 48g 4780 S 90.5 77.0 14728:27
> /usr/java/jdk1.8.0_162/bin/java -Dproc_datanode{code}
>
> This too much memory used leads to my machine unresponsive(if enable swap),
> or oom-killer happens.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]