Hi,

I am running hadoop over 100 million files from a local file system.

Each split contains a subset of the file collection. The recordReader provides 
a file as a record.

The problem is that I get java.lang.outofmemoryerror: Java heap space.

I tried to increase the heap size to 16Gb, enougth to handle 20 millions files.

Up to 100 millions, I get the same problem and the job fails. Moreover, the job 
loading time exceeds 40 minutes while handling 20million files.

Any suggestions in order to avoid this delay and the Heap Space releated 
problems?

Regards,

Le 10 mai 2010 à 20:39, Marcin Sieniek a écrit :

> Hi Jyothish,
> 
> I had exactly the same problem and I solved it. To answer your question: as 
> for me, HDFS and NFS are totally incompatible;) However, you may configure 
> MadReduce to run on NFS only, without HDFS. See the last but one post here:
> http://old.nabble.com/Hadoop-over-Lustre--td19092864.html
> I did it and it works very well for NFS too (note that old hadoop-site.xml 
> was splited to core-site.xml, mapred-site.xml and hdfs-site.xml in newer 
> releases). Let me know if you have any problems with this configuration.
> 
> Marcin
> 
> Le 2010-05-10 20:16, alex kamil a écrit :
>> 
>> Jyothish, 
>> 
>> as far as i know it is not recommended to run Hadoop on NFS, you suppose to 
>> use use local volumes for all mapred and dfs directories
>> 
>> Alex
>> 
>> On Mon, May 10, 2010 at 2:00 PM, Jyothish Soman <[email protected]> 
>> wrote:
>> I have a distributed system on NFS, and wanted to use MapReduce on it, but 
>> the system keeps spawning errors related to inability to allocate temporary 
>> space. 
>> Though sufficient is available, hence my question. 
>>  Is HDFS and NFS compatible?.
>> 
> 

Reply via email to