Hi Brian,
Thank you for posting your solution here, I will try this on my testing
server and do some load tests.
Also thank you for pointing out some leaks inside libhdfs. Actually I'm
writing a Python extension for HDFS and noticed some Memory Leaks, but I
was not sure if it's the bug of my extension or somewhere else.
Regards,
Huy Phan
Brian Bockelman wrote:
Hey Huy,
Heres what we do:
1) include hdfsJniHelper.h
2) Do the following when you're done with the filesystem:
if (NULL != fs) {
//Get the JNIEnv* corresponding to current thread
JNIEnv* env = getJNIEnv();
if (env == NULL) {
ret = -EIO;
} else {
//Parameters
jobject jFS = (jobject)fs;
//Release unnecessary references
(*env)->DeleteGlobalRef(env, jFS);
}
}
I also recommend the below patch to remove a few other leaks. This
saves about .5KB / file open in leaked memory.
Index: src/c++/libhdfs/hdfs.c
===================================================================
--- src/c++/libhdfs/hdfs.c (revision 806186)
+++ src/c++/libhdfs/hdfs.c (working copy)
@@ -248,6 +249,7 @@
destroyLocalReference(env, jUserString);
destroyLocalReference(env, jGroups);
destroyLocalReference(env, jUgi);
+ destroyLocalReference(env, jAttrString);
}
#else
Index: src/c++/libhdfs/hdfsJniHelper.c
===================================================================
--- src/c++/libhdfs/hdfsJniHelper.c (revision 806186)
+++ src/c++/libhdfs/hdfsJniHelper.c (working copy)
@@ -239,6 +241,7 @@
fprintf(stderr, "ERROR: jelem == NULL\n");
}
(*env)->SetObjectArrayElement(env, result, i, jelem);
+ (*env)->DeleteLocalRef(env, jelem);
}
return result;
}
Of course, this is not an official solution, not supported, may
explode, etc.
Brian
On Oct 13, 2009, at 12:40 PM, Huy Phan wrote:
Hi Eli,
You're right that the problem is resolved in 0.20 with function
newInstance(), unfortunately my system's running on Hadoop 0.18.3 and
i'm still looking for a way to patch this version without affecting
the current system.
Regards,
Huy Phan
Eli Collins wrote:
Hey Huy,
What version of hadoop are you using? I think HADOOP-4655 may have
resolved the issue you're seeing but I think is only in 20 and later.
Thanks,
Eli
On Mon, Oct 12, 2009 at 8:52 PM, Huy Phan <dac...@gmail.com> wrote:
Hi All,
I'm writing a multi-thread application using libhdfs in C, a
known issue
of HDFS is that the FileSystem API caches FileSystem handles and
always
returned the same FileSystem handle when called from different
threads. It
means even though I called hdfsConnect for many times, I should not
call
hdfsDisconnect in any single thread.
This may lead to memory leak on system, do you know any workaround
for this
issue ?