hello frnds . I am Mtech students at IIT Bombay , and i am working project on hadoop . When i launch the job , MAp phase with one master and three slave nodes (Master isnt a lave node itself) , Map phase runs to completion successfully , but in reduce phase , it runs to about 16% completion , then it fails and throws shuffle error . Forms shows that , this error is arises when one slave running reducer try to fetch the Map-output from another slave node which runs the Mapper . The problem is that the Reducer slave isnt able to resolve the hostname of the reducer slave . This causes the Reducer slave to thorow shuffle error example . The problem is more about setting in /etc/hosts file . The terminal output is below :
11/08/14 19:35:32 INFO HadoopSweepLine: Launching the job. 11/08/14 19:35:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 11/08/14 19:35:32 INFO mapred.FileInputFormat: Total input paths to process : 1 11/08/14 19:35:33 INFO mapred.JobClient: Running job: job_201108141930_0002 11/08/14 19:35:34 INFO mapred.JobClient: map 0% reduce 0% 11/08/14 19:35:44 INFO mapred.JobClient: map 50% reduce 0% 11/08/14 19:35:47 INFO mapred.JobClient: map 100% reduce 0% 11/08/14 19:35:53 INFO mapred.JobClient: map 100% reduce 8% 11/08/14 19:35:59 INFO mapred.JobClient: map 100% reduce 0% 11/08/14 19:36:01 INFO mapred.JobClient: Task Id : attempt_201108141930_0002_r_000000_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 11/08/14 19:36:01 WARN mapred.JobClient: Error reading task outputgrc1-desktop 11/08/14 19:36:01 WARN mapred.JobClient: Error reading task outputgrc1-desktop 11/08/14 19:36:03 INFO mapred.JobClient: Task Id : attempt_201108141930_0002_r_000001_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 11/08/14 19:36:03 WARN mapred.JobClient: Error reading task outputcp-desktop 11/08/14 19:36:03 WARN mapred.JobClient: Error reading task outputcp-desktop 11/08/14 19:36:13 INFO mapred.JobClient: map 100% reduce 8% 11/08/14 19:36:16 INFO mapred.JobClient: map 100% reduce 0% 11/08/14 19:36:18 INFO mapred.JobClient: Task Id : attempt_201108141930_0002_r_000000_1, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputcp-desktop 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputcp-desktop 11/08/14 19:36:18 INFO mapred.JobClient: Task Id : attempt_201108141930_0002_r_000001_1, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputdove 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputdove ..... Continue and job fails . Also, the job gets successfully completed with exactly one Slave Machine , bcoz the communication is between namenode & a slve node only , no slave-slave Communication . It shall be great help if anyone running hadoop (0.20.1) on ubuntu with multiple datanodes (not in pseudo-disrtibuted mode) can post the conent of his /etc/hosts file of both the Master & slaves . It shall be a great help for me . My /etc/hosts on Master is : 127.0.0.1 localhost.localdomain localhost 127.0.1.1 ubuntu 10.14.11.32 Abhishek-Master <<- Master node 10.14.13.18 manjeet-home manjeet-home.localdomain (slave) 10.129.26.215 cp-lab cp-lab.localdomain (slave) 10.105.18.1 vadehra vadehra.localdomain (slave) # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts ------------- /etc/hosts on slave is (say cp-lab) : 127.0.0.1 cp-lab localhost.localdomain localhost 127.0.1.1 cp-desktop 10.14.11.32 Abhishek-Master 10.14.13.18 manjeet-home manjeet-home.localdomain 10.105.18.1 vadehra vadehra.localdomain # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ----------------------------------------------- Please somebosy help me why Reducer Slaves are not able fetch the Mapout data from MApper slaves . any Help shall be appreciated . Thanks & regards -- View this message in context: http://old.nabble.com/Shuuling-Error-in-Reduce-Phase-tp32259596p32259596.html Sent from the Hadoop core-dev mailing list archive at Nabble.com.