There is an element in the config for Java params. Set it to -Xms1024M and give it a shot. It is definitely seems like a case of you running out of heap space.
A -----Original Message----- From: Emmanuel JOKE [mailto:[EMAIL PROTECTED] Sent: Saturday, June 30, 2007 10:32 AM To: hadoop-user Subject: OutOfMemory Hi, I tried to update my db, using the following command: bin/nutch updatedb crawld/crawldb crawld/segments/20070628095836 and my 2 nodes had an error and i can see the following exception: 2007-06-30 12:24:29,688 INFO mapred.TaskInProgress - Error from task_0001_m_000000_1: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java :94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.io.Text.write(Text.java:243) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect( MapTask.java:316) at org.apache.nutch.crawl.CrawlDbFilter.map(CrawlDbFilter.java:99) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java :1445) My cluster of 2 machines used each 512 M0 of memory. isn't it enough ? What is the best practice ? Do you any idea if they are a bug ? or is it just my conf which is not correct ? Thanks for your help
