Hi Brian, I tried the configuration changes suggested by you but it did not work for me. (I am beginning to get a feeling that make a node function as both master and slave is a bad idea!).
Can you experiment this for me on your cluster: Config: 2 node cluster. Node1: Acts as both master and slave Node 2: Acts as slave only. An input file of ~5M Word count example program using the command: bin/hadoop jar hadoop-0.12.0-examples.jar wordcount -m 4 input output I really appreciate your help. Thanks in advance! -gaurav Brian Wedel-2 wrote: > > I am experimenting on a small cluster as well (4 machines) and I had > success with the following configuration: > > - configuration files on both the master and slaves are the same > - in the master/slave lists I only used the ip address (not > localhost) and ommited the user e.g. (hadoop@) > - in the fs.default.name configuration variable use > hdfs://<host>:<port> (I don't know if this is necessary - but it seems > you can specify other types of filesystems - not sure which is > default) > - use 0.12.0 release - I was using 0.11.2 and was getting some odd > errors that disappeared when I upgraded > - I don't run a datanode daemon on the same machine a the namenode -- > this was a problem when I was trying the hadoop-streaming contributed > package for scripting. Not sure if it matters for the examples > > This configuration worked me. > -Brian > > On 3/7/07, Gaurav Agarwal <[EMAIL PROTECTED]> wrote: >> >> Hi Richard, >> >> I am facing this error very consistently. I have tried the another >> nightly >> build (4 Mar) but that gave same exception. >> >> thanks, >> gaurav >> >> >> >> Richard Yang-3 wrote: >> > >> > Hi Gaurav, >> > >> > Does this error always happen?? >> > Our settings are similar. >> > Mine contains some error messages about IOExceptions, not able to >> obtain >> > certain blocks, not able to create a new block. Although the program >> hung >> > some time, in most cases, they were able to complete with correct >> results. >> > Btw, I am running the grep sample program on version 0.11.2. >> > >> > Best Regards >> > >> > Richard Yang >> > [EMAIL PROTECTED] >> > [EMAIL PROTECTED] >> > >> > >> > -----Original Message----- >> > From: Gaurav Agarwal [mailto:[EMAIL PROTECTED] >> > Sent: Wednesday, March 07, 2007 12:22 AM >> > To: [email protected] >> > Subject: Hadoop 'wordcount' program hanging in the Reduce phase. >> > >> > >> > Hi Everyone! >> > I am new user to Hadoop and trying to set up a small cluster using >> Hadoop. >> > but I am facing some issues doing that. >> > >> > I am trying to run the Hadoop 'wordcount' example program which come >> > bundled >> > with it. I am able to successfully run the program on a single node >> > cluster >> > (that is using my local machine only). But, when I try to run the same >> > program on a cluster of two machines, the program hangs in the 'reduce' >> > phase. >> > >> > >> > Settings: >> > >> > Master Node: 192.168.1.150 (dennis-laptop) >> > Slave Node: 192.168.1.201 (traal) >> > >> > User Account on both Master and Slave is named : Hadoop >> > >> > Password-less ssh login to Slave from the Master is working. >> > >> > JAVA_HOME is set appropriately in the hadoop-env.sh file on both >> > Master/Slave. >> > >> > MASTER >> > >> > 1) conf/slaves >> > localhost >> > [EMAIL PROTECTED] >> > >> > 2) conf/master >> > localhost >> > >> > 3) conf/hadoop-site.xml >> > <?xml version="1.0"?> >> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >> > >> > <!-- Put site-specific property overrides in this file. --> >> > >> > <configuration> >> > <property> >> > <name>fs.default.name</name> >> > <value>192.168.1.150:50000</value> >> > </property> >> > >> > <property> >> > <name>mapred.job.tracker</name> >> > <value>192.168.1.150:50001</value> >> > </property> >> > >> > <property> >> > <name>dfs.replication</name> >> > <value>2</value> >> > </property> >> > </configuration> >> > >> > SLAVE >> > >> > 1) conf/slaves >> > localhost >> > >> > 2) conf/master >> > [EMAIL PROTECTED] >> > >> > 3) conf/hadoop-site.xml >> > <?xml version="1.0"?> >> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >> > >> > <!-- Put site-specific property overrides in this file. --> >> > >> > <configuration> >> > <property> >> > <name>fs.default.name</name> >> > <value>192.168.1.150:50000</value> >> > </property> >> > >> > <property> >> > <name>mapred.job.tracker</name> >> > <value>192.168.1.150:50001</value> >> > </property> >> > >> > <property> >> > <name>dfs.replication</name> >> > <value>2</value> >> > </property> >> > </configuration> >> > >> > >> > CONSOLE OUTPUT >> > bin/hadoop jar hadoop-*-examples.jar wordcount -m 10 -r 2 input output >> > 07/03/06 23:17:17 INFO mapred.InputFormatBase: Total input paths to >> > process >> > : 1 >> > 07/03/06 23:17:18 INFO mapred.JobClient: Running job: job_0001 >> > 07/03/06 23:17:19 INFO mapred.JobClient: map 0% reduce 0% >> > 07/03/06 23:17:29 INFO mapred.JobClient: map 20% reduce 0% >> > 07/03/06 23:17:30 INFO mapred.JobClient: map 40% reduce 0% >> > 07/03/06 23:17:32 INFO mapred.JobClient: map 80% reduce 0% >> > 07/03/06 23:17:33 INFO mapred.JobClient: map 100% reduce 0% >> > 07/03/06 23:17:42 INFO mapred.JobClient: map 100% reduce 3% >> > 07/03/06 23:17:43 INFO mapred.JobClient: map 100% reduce 5% >> > 07/03/06 23:17:44 INFO mapred.JobClient: map 100% reduce 8% >> > 07/03/06 23:17:52 INFO mapred.JobClient: map 100% reduce 10% >> > 07/03/06 23:17:53 INFO mapred.JobClient: map 100% reduce 13% >> > 07/03/06 23:18:03 INFO mapred.JobClient: map 100% reduce 16% >> > >> > >> > The only exception I can see from the log files is in the 'TaskTracker' >> > log >> > file: >> > >> > 2007-03-06 23:17:32,214 INFO org.apache.hadoop.mapred.TaskRunner: >> > task_0001_r_000000_0 Copying task_0001_m_000002_0 output from traal. >> > 2007-03-06 23:17:32,221 INFO org.apache.hadoop.mapred.TaskRunner: >> > task_0001_r_000000_0 Copying task_0001_m_000001_0 output from >> > dennis-laptop. >> > 2007-03-06 23:17:32,368 WARN org.apache.hadoop.mapred.TaskRunner: >> > task_0001_r_000000_0 copy failed: task_0001_m_000002_0 from traal >> > 2007-03-06 23:17:32,368 WARN org.apache.hadoop.mapred.TaskRunner: >> > java.io.IOException: File >> > /tmp/hadoop-hadoop/mapred/local/task_0001_r_000000_0/map_2.out-0 not >> > created >> > at >> > >> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceT >> > askRunner.java:301) >> > at >> > >> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunn >> > er.java:262) >> > >> > 2007-03-06 23:17:32,369 WARN org.apache.hadoop.mapred.TaskRunner: >> > task_0001_r_000000_0 adding host traal to penalty box, next contact in >> 99 >> > seconds >> > >> > I am attaching the master log files just in case anyone wants to check >> > them. >> > >> > Any help will be greatly appreciated! >> > >> > -gaurav >> > >> > >> http://www.nabble.com/file/7013/hadoop-hadoop-tasktracker-dennis-laptop.log >> > hadoop-hadoop-tasktracker-dennis-laptop.log </br> >> > >> http://www.nabble.com/file/7012/hadoop-hadoop-jobtracker-dennis-laptop.log >> > hadoop-hadoop-jobtracker-dennis-laptop.log </br> >> > >> http://www.nabble.com/file/7011/hadoop-hadoop-namenode-dennis-laptop.log >> > hadoop-hadoop-namenode-dennis-laptop.log </br> >> > >> http://www.nabble.com/file/7010/hadoop-hadoop-datanode-dennis-laptop.log >> > hadoop-hadoop-datanode-dennis-laptop.log >> > -- >> > View this message in context: >> > >> http://www.nabble.com/Hadoop-%27wordcount%27-program-hanging-in-the-Reduce-p >> > hase.-tf3360661.html#a9348424 >> > Sent from the Hadoop Users mailing list archive at Nabble.com. >> > >> > >> > >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Hadoop-%27wordcount%27-program-hanging-in-the-Reduce-phase.-tf3360661.html#a9357648 >> Sent from the Hadoop Users mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Hadoop-%27wordcount%27-program-hanging-in-the-Reduce-phase.-tf3360661.html#a9362460 Sent from the Hadoop Users mailing list archive at Nabble.com.
