There is a python interface to access HDFS files if that helps your case :
http://wiki.apache.org/hadoop/HDFS-APIs

thanks,
dhruba

On Tue, Oct 13, 2009 at 12:00 PM, Huy Phan <dac...@gmail.com> wrote:

> Hi Brian,
> Thank you for posting your solution here, I will try this on my testing
> server and do some load tests.
> Also thank you for pointing out some leaks inside libhdfs. Actually I'm
> writing a Python extension for HDFS and noticed some Memory Leaks, but I was
> not sure if it's the bug of my extension or somewhere else.
>
> Regards,
> Huy Phan
>
> Brian Bockelman wrote:
>
>> Hey Huy,
>>
>> Heres what we do:
>>
>> 1) include hdfsJniHelper.h
>> 2) Do the following when you're done with the filesystem:
>>
>>    if (NULL != fs) {
>>      //Get the JNIEnv* corresponding to current thread
>>      JNIEnv* env = getJNIEnv();
>>
>>      if (env == NULL) {
>>        ret = -EIO;
>>      } else {
>>
>>        //Parameters
>>        jobject jFS = (jobject)fs;
>>
>>        //Release unnecessary references
>>        (*env)->DeleteGlobalRef(env, jFS);
>>      }
>>    }
>>
>> I also recommend the below patch to remove a few other leaks.  This saves
>> about .5KB / file open in leaked memory.
>>
>> Index: src/c++/libhdfs/hdfs.c
>> ===================================================================
>> --- src/c++/libhdfs/hdfs.c      (revision 806186)
>> +++ src/c++/libhdfs/hdfs.c      (working copy)
>> @@ -248,6 +249,7 @@
>>       destroyLocalReference(env, jUserString);
>>       destroyLocalReference(env, jGroups);
>>       destroyLocalReference(env, jUgi);
>> +      destroyLocalReference(env, jAttrString);
>>     }
>>  #else
>>
>> Index: src/c++/libhdfs/hdfsJniHelper.c
>> ===================================================================
>> --- src/c++/libhdfs/hdfsJniHelper.c     (revision 806186)
>> +++ src/c++/libhdfs/hdfsJniHelper.c     (working copy)
>> @@ -239,6 +241,7 @@
>>       fprintf(stderr, "ERROR: jelem == NULL\n");
>>     }
>>     (*env)->SetObjectArrayElement(env, result, i, jelem);
>> +    (*env)->DeleteLocalRef(env, jelem);
>>   }
>>   return result;
>>  }
>>
>>
>> Of course, this is not an official solution, not supported, may explode,
>> etc.
>>
>> Brian
>>
>> On Oct 13, 2009, at 12:40 PM, Huy Phan wrote:
>>
>>  Hi Eli,
>>>  You're right that the problem is resolved in 0.20 with function
>>> newInstance(), unfortunately my system's running on Hadoop 0.18.3 and i'm
>>> still looking for a way to patch this version without affecting the current
>>> system.
>>>
>>> Regards,
>>> Huy Phan
>>>
>>> Eli Collins wrote:
>>>
>>>> Hey Huy,
>>>>
>>>> What version of hadoop are you using?  I think HADOOP-4655 may have
>>>> resolved the issue you're seeing but I think is only in 20 and later.
>>>>
>>>> Thanks,
>>>> Eli
>>>>
>>>> On Mon, Oct 12, 2009 at 8:52 PM, Huy Phan <dac...@gmail.com> wrote:
>>>>
>>>>  Hi All,
>>>>>  I'm writing a multi-thread application using libhdfs in C, a known
>>>>> issue
>>>>> of HDFS is that the FileSystem API caches FileSystem handles and always
>>>>> returned the same FileSystem handle when called from different threads.
>>>>> It
>>>>> means even though I called hdfsConnect for many times, I should not
>>>>> call
>>>>> hdfsDisconnect in any single thread.
>>>>> This may lead to memory leak on system, do you know any workaround for
>>>>> this
>>>>> issue ?
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>


-- 
Connect to me at http://www.facebook.com/dhruba

Reply via email to