hello frnds . I am Mtech students at IIT Bombay , and i am working project on
hadoop . When i launch the job , MAp phase with one master and three slave
nodes (Master isnt a lave node itself) , Map phase runs to completion
successfully , but in reduce phase , it runs to about 16% completion , then
it fails and throws shuffle error . Forms shows that , this error is arises
when one slave running reducer try to fetch the Map-output from another
slave node which runs the Mapper . The problem is that the Reducer slave
isnt able to resolve the hostname of the reducer slave . This causes the
Reducer slave to thorow shuffle error example . The problem is more about
setting in /etc/hosts file . The terminal output is below :

11/08/14 19:35:32 INFO HadoopSweepLine: Launching the job.
11/08/14 19:35:32 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/08/14 19:35:32 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/08/14 19:35:33 INFO mapred.JobClient: Running job: job_201108141930_0002
11/08/14 19:35:34 INFO mapred.JobClient:  map 0% reduce 0%
11/08/14 19:35:44 INFO mapred.JobClient:  map 50% reduce 0%
11/08/14 19:35:47 INFO mapred.JobClient:  map 100% reduce 0%
11/08/14 19:35:53 INFO mapred.JobClient:  map 100% reduce 8%
11/08/14 19:35:59 INFO mapred.JobClient:  map 100% reduce 0%
11/08/14 19:36:01 INFO mapred.JobClient: Task Id :
attempt_201108141930_0002_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
11/08/14 19:36:01 WARN mapred.JobClient: Error reading task
outputgrc1-desktop
11/08/14 19:36:01 WARN mapred.JobClient: Error reading task
outputgrc1-desktop
11/08/14 19:36:03 INFO mapred.JobClient: Task Id :
attempt_201108141930_0002_r_000001_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
11/08/14 19:36:03 WARN mapred.JobClient: Error reading task outputcp-desktop
11/08/14 19:36:03 WARN mapred.JobClient: Error reading task outputcp-desktop
11/08/14 19:36:13 INFO mapred.JobClient:  map 100% reduce 8%
11/08/14 19:36:16 INFO mapred.JobClient:  map 100% reduce 0%
11/08/14 19:36:18 INFO mapred.JobClient: Task Id :
attempt_201108141930_0002_r_000000_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputcp-desktop
11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputcp-desktop
11/08/14 19:36:18 INFO mapred.JobClient: Task Id :
attempt_201108141930_0002_r_000001_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputdove
11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputdove
..... Continue and job fails .

Also, the job gets successfully completed with exactly one Slave Machine ,
bcoz the communication is between namenode & a slve node only , no
slave-slave Communication . 

It shall be great help if anyone running hadoop (0.20.1) on ubuntu with
multiple datanodes (not in pseudo-disrtibuted mode) can post the conent of
his /etc/hosts file of both the Master & slaves . It shall be a great help
for me . 

My /etc/hosts on Master is  :

127.0.0.1       localhost.localdomain localhost
127.0.1.1       ubuntu
10.14.11.32     Abhishek-Master    <<- Master node
10.14.13.18     manjeet-home manjeet-home.localdomain   (slave)
10.129.26.215   cp-lab cp-lab.localdomain   (slave)
10.105.18.1     vadehra vadehra.localdomain  (slave)

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

-------------

/etc/hosts on slave is (say cp-lab) :

127.0.0.1       cp-lab localhost.localdomain localhost
127.0.1.1       cp-desktop
10.14.11.32     Abhishek-Master
10.14.13.18     manjeet-home manjeet-home.localdomain
10.105.18.1     vadehra vadehra.localdomain

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
-----------------------------------------------

Please somebosy help me why Reducer Slaves are not able fetch the Mapout
data from MApper slaves .
any Help shall be appreciated .
Thanks & regards
-- 
View this message in context: 
http://old.nabble.com/Shuuling-Error-in-Reduce-Phase-tp32259596p32259596.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.

Reply via email to