I tried to launch mapred on 2 machines: 192.168.0.250 and 192.168.0.111. In nutch-site.xml I specified parameters:
1) On the both machines: <property> <name>fs.default.name</name> <value>192.168.0.250:9009</value> <description>The name of the default file system. Either the literal string "local" or a host:port for NDFS.</description> </property> <property> <name>mapred.job.tracker</name> <value>192.168.0.250:9010</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.map.tasks</name> <value>2</value> <description>The default number of map tasks per job. Typically set to a prime several times greater than number of available hosts. Ignored when mapred.job.tracker is "local". </description> </property> <property> <name>mapred.tasktracker.tasks.maximum</name> <value>2</value> <description>The maximum number of tasks that will be run simultaneously by a task tracker. </description> </property> <property> <name>mapred.reduce.tasks</name> <value>2</value> <description>The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". </description> </property> On 192.168.0.250 I started: 2) bin/nutch-daemon.sh start datanode 3) bin/nutch-daemon.sh start namenode 4) bin/nutch-daemon.sh start jobtracker 5) bin/nutch-daemon.sh start tasktracker I created directory seeds and file urls in it. Urls contained 2 links. Then I added that directory to NDFS (bin/nutch ndfs -put ./seeds seeds). Directory was added successfully.. Then I launched command: bin/nutch crawl seeds -depth 2 I a result I received log written by jobtracker: .... 051123 053118 Adding task 'task_m_z66npx' to set for tracker 'tracker_53845' 051123 053118 Adding task 'task_m_xaynqo' to set for tracker 'tracker_11518' 051123 053130 Task 'task_m_z66npx' has finished successfully. Log written by tasktracker on 192.168.0.111: ...... 051110 142607 task_m_z66npx 0.0% /user/root/seeds/urls:0+31 051110 142607 task_m_z66npx 1.0% /user/root/seeds/urls:0+31 051110 142607 Task task_m_z66npx is done. Log written by tasktracker on 192.168.0.250: .... 051123 053125 task_m_xaynqo 0.12903225% /user/root/seeds/urls:31+31 051123 053126 task_m_xaynqo -683.9677% /user/root/seeds/urls:31+31 051123 053127 task_m_xaynqo -2129.9678% /user/root/seeds/urls:31+31 051123 053128 task_m_xaynqo -3483.0322% /user/root/seeds/urls:31+31 051123 053129 task_m_xaynqo -4976.2256% /user/root/seeds/urls:31+31 051123 053130 task_m_xaynqo -6449.1934% /user/root/seeds/urls:31+31 051123 053131 task_m_xaynqo -7898.258% /user/root/seeds/urls:31+31 051123 053132 task_m_xaynqo -9232.193% /user/root/seeds/urls:31+31 051123 053133 task_m_xaynqo -10694.3545% /user/root/seeds/urls:31+31 051123 053134 task_m_xaynqo -12139.226% /user/root/seeds/urls:31+31 051123 053135 task_m_xaynqo -13416.677% /user/root/seeds/urls:31+31 051123 053136 task_m_xaynqo -14885.741% /user/root/seeds/urls:31+31 ... and so on... e.g. in this log were records with reducing percents. I concluded that was an attempt to separate inject to 2 machines e.g. were 2 tasks: 'task_m_z66npx' and 'task_m_xaynqo'. And 'task_m_z66npx' was finished successfully and 'task_m_xaynqo' caused some problems (negative progress). But if I change parameter mapred.reduce.tasks to 4 all tasks finished successfully and all work right. -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 22, 2005 2:10 AM To: nutch-dev@lucene.apache.org Subject: Re: mapred.map.tasks [EMAIL PROTECTED] wrote: > Why we need parameter mapred.map.tasks greater than number of available > host? If we set it equal to number of host, we got "negative progress > percentages" problem. Can you please post a simple example that demonstrates the "negative progress" problem? E.g., the minimal changes to your conf/ directory required to illustrate this, how you start your daemons, etc. Thanks, Doug