Thanks for your advice but i finally fixed my pb by increasing the number of
map to 200 as described in the tutorial:
http://wiki.apache.org/nutch/Nutch0%2e9-Hadoop0%2e10-Tutorial
==> "I noticed that the number of map and reduce task has an impact on the
performance of Hadoop. Many times after crawling a lot of pages the nodes
reported 
'java.lang.OutOfMemoryError<http://wiki.apache.org/nutch/OutOfMemoryError>:
Java heap space' errors, this happened also in the indexing part. Increasing
the number of maps solved these problems, with an index that has over
200.000 pages I needed 306 maps in total over 3 machines. By setting the
mapred.maps.tasks property in hadoop-site.xml to 99 (much higher than what
is advised in other tutorials and in the hadoop-site.xml file) that problem
is solved. *"*

Beside, there is something i don't understand. The default configuration is
mapred.tasktracker.tasks.maximum=2 and
mapred.child.java.opts = -Xmx200m, so increasing the total memory from 512
to 1024 Mo wont change anything, isn't it ?

Anyway, thanks for your help.


If you are using machines with only 512MB of memory, it is probably a very
bad idea to set minimum help size so large.

-Xms400M might be more appropriate.

I should say, though that if you have a program that is worth using hadoop
on, you have a problem that is worth having more memory on each processor.
Most of the work I do benefits more from memory than from processor, at
least up to >1-2GB RAM.

On 6/30/07 11:51 AM, "Avinash Lakshman" <[EMAIL PROTECTED]> wrote:

There is an element in the config for Java params. Set it to -Xms1024M
and give it a shot. It is definitely seems like a case of you running
out of heap space.

A
-----Original Message-----
From: Emmanuel JOKE [mailto:[EMAIL PROTECTED]
 ...
My cluster of 2 machines used each 512 M0 of memory. isn't it enough ?
What is the best practice ?

Do you any idea if they are a bug ? or is it just my conf which is not
correct ?

Thanks for your help


Reply via email to