Re: Hadoop 'wordcount' program hanging in the Reduce phase.

Gaurav Agarwal Wed, 07 Mar 2007 13:15:59 -0800

Thanks for the reply. The user name is created properly - 'hadoop' . That is
not  the problem as the jobtracker is able to start tasks on the slave
machine.


I tried to play around and observed that if I make the remote machine as the
only slave (as opposed to the master node acting as one of the slaves too),
the tasks run fine.

It could be that making a node function as both slave and master is a bad
idea (although I do not see any reason for not  doing so). I will try to get
access to more slave machines and see if my guess is correct.

thanks,
gaurav

jaylac wrote:
> 
> 
> Hi gaurav
> 
> Im also a beginner.... Still i try to tell my views... This may not be
> correct...
> 
> You said u've created user name called "Hadoop" on both system.. But in
> slave file u've witten as [EMAIL PROTECTED] Is it not case
> sensitive? So try changing it to [EMAIL PROTECTED] on both the
> systems...
> 
> Also use the port 9010 and 9011 in the hadoop-site.html file... 
> 
> But these might be no way related to ur problem.... Still try these and
> let me know.
> 
> Regards,
> Jaya
> 
> 
> Gaurav Agarwal wrote:
>> 
>> Hi Everyone!
>> I am new user to Hadoop and trying to set up a small cluster using Hadoop
>> (Release Mar 02) on Ubuntu 6.10 (Edgy) ; but I am facing some issues
>> doing that.
>> 
>> I am trying to run the Hadoop 'wordcount' example program which come
>> bundled with it. I am able to successfully run the program on a single
>> node cluster (that is using my local machine only). But, when I try to
>> run the same program on a cluster of two machines, the program hangs in
>> the 'reduce' phase.
>> 
>> 
>> Settings:
>> 
>> Master Node: 192.168.1.150 (dennis-laptop)
>> Slave Node: 192.168.1.201 (traal)
>> 
>> User Account on both Master and Slave is named : Hadoop
>> 
>> Password-less ssh login to Slave from the Master is working.
>> 
>> JAVA_HOME is set appropriately in the hadoop-env.sh file on both
>> Master/Slave.
>> 
>> MASTER
>> 
>> 1) conf/slaves
>> localhost
>> [EMAIL PROTECTED]
>> 
>> 2) conf/master
>> localhost
>> 
>> 3) conf/hadoop-site.xml
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> 
>> <!-- Put site-specific property overrides in this file. -->
>> 
>> <configuration>
>> <property>
>>          <name>fs.default.name</name>
>>          <value>192.168.1.150:50000</value>
>>     </property>
>> 
>>     <property>
>>          <name>mapred.job.tracker</name>
>>          <value>192.168.1.150:50001</value>
>>      </property>
>>         
>>     <property>
>>          <name>dfs.replication</name>
>>          <value>2</value>
>>     </property>
>> </configuration>
>> 
>> SLAVE
>> 
>> 1) conf/slaves
>> localhost
>> 
>> 2) conf/master
>> [EMAIL PROTECTED]
>> 
>> 3) conf/hadoop-site.xml
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> 
>> <!-- Put site-specific property overrides in this file. -->
>> 
>> <configuration>
>> <property>
>>          <name>fs.default.name</name>
>>          <value>192.168.1.150:50000</value>
>>     </property>
>> 
>>     <property>
>>          <name>mapred.job.tracker</name>
>>          <value>192.168.1.150:50001</value>
>>      </property>
>>         
>>     <property>
>>          <name>dfs.replication</name>
>>          <value>2</value>
>>     </property>
>> </configuration>
>> 
>> 
>> CONSOLE OUTPUT
>> bin/hadoop jar hadoop-*-examples.jar wordcount -m 10 -r 2 input output
>> 07/03/06 23:17:17 INFO mapred.InputFormatBase: Total input paths to
>> process : 1
>> 07/03/06 23:17:18 INFO mapred.JobClient: Running job: job_0001
>> 07/03/06 23:17:19 INFO mapred.JobClient:  map 0% reduce 0%
>> 07/03/06 23:17:29 INFO mapred.JobClient:  map 20% reduce 0%
>> 07/03/06 23:17:30 INFO mapred.JobClient:  map 40% reduce 0%
>> 07/03/06 23:17:32 INFO mapred.JobClient:  map 80% reduce 0%
>> 07/03/06 23:17:33 INFO mapred.JobClient:  map 100% reduce 0%
>> 07/03/06 23:17:42 INFO mapred.JobClient:  map 100% reduce 3%
>> 07/03/06 23:17:43 INFO mapred.JobClient:  map 100% reduce 5%
>> 07/03/06 23:17:44 INFO mapred.JobClient:  map 100% reduce 8%
>> 07/03/06 23:17:52 INFO mapred.JobClient:  map 100% reduce 10%
>> 07/03/06 23:17:53 INFO mapred.JobClient:  map 100% reduce 13%
>> 07/03/06 23:18:03 INFO mapred.JobClient:  map 100% reduce 16%
>> 
>> 
>> The only exception I can see from the log files is in the 'TaskTracker'
>> log file:
>> 
>> 2007-03-06 23:17:32,214 INFO org.apache.hadoop.mapred.TaskRunner:
>> task_0001_r_000000_0 Copying task_0001_m_000002_0 output from traal.
>> 2007-03-06 23:17:32,221 INFO org.apache.hadoop.mapred.TaskRunner:
>> task_0001_r_000000_0 Copying task_0001_m_000001_0 output from
>> dennis-laptop.
>> 2007-03-06 23:17:32,368 WARN org.apache.hadoop.mapred.TaskRunner:
>> task_0001_r_000000_0 copy failed: task_0001_m_000002_0 from traal
>> 2007-03-06 23:17:32,368 WARN org.apache.hadoop.mapred.TaskRunner:
>> java.io.IOException: File
>> /tmp/hadoop-hadoop/mapred/local/task_0001_r_000000_0/map_2.out-0 not
>> created
>> at
>> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:301)
>> at
>> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:262)
>> 
>> 2007-03-06 23:17:32,369 WARN org.apache.hadoop.mapred.TaskRunner:
>> task_0001_r_000000_0 adding host traal to penalty box, next contact in 99
>> seconds
>> 
>> I am attaching the master log files just in case anyone wants to check
>> them.
>> 
>> Any help will be greatly appreciated! 
>> 
>> -gaurav
>> 
>> 
>> http://www.nabble.com/file/7013/hadoop-hadoop-tasktracker-dennis-laptop.log
>> hadoop-hadoop-tasktracker-dennis-laptop.log </br>
>> http://www.nabble.com/file/7012/hadoop-hadoop-jobtracker-dennis-laptop.log
>> hadoop-hadoop-jobtracker-dennis-laptop.log </br>
>> http://www.nabble.com/file/7011/hadoop-hadoop-namenode-dennis-laptop.log
>> hadoop-hadoop-namenode-dennis-laptop.log </br>
>> http://www.nabble.com/file/7010/hadoop-hadoop-datanode-dennis-laptop.log
>> hadoop-hadoop-datanode-dennis-laptop.log 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Hadoop-%27wordcount%27-program-hanging-in-the-Reduce-phase.-tf3360661.html#a9362369
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: Hadoop 'wordcount' program hanging in the Reduce phase.

Reply via email to