hi i'm using hadoop 0.17.2.1. i'm trying cluster 2 systems. 1- redhat - master( jobtracker + namenode+ tasktracker + datanode) 1- ubuntu - slave ( tasktracker + datanode) whenever i run wordcount example, map tasks are done and when it comes to reduce tasks.. it halts.. it gets completed after a long time.say abt 43min,58 sec for just 5 inputs
this happens when reduce task runs on master. i find too many fetch failures for certain tasks and these tasks were done on slave(ubuntu). And i find Diskchecker$diskerrorexception in logs... after compiling, when i execute bin/hadoop jar word/word.jar org.myorg.WordCount input output 08/09/10 15:12:56 INFO mapred.FileInputFormat: Total input paths to process : 5 08/09/10 15:12:56 INFO mapred.JobClient: Running job: job_200809101511_0003 08/09/10 15:12:57 INFO mapred.JobClient: map 0% reduce 0% 08/09/10 15:13:00 INFO mapred.JobClient: map 20% reduce 0% 08/09/10 15:13:01 INFO mapred.JobClient: map 80% reduce 0% 08/09/10 15:13:02 INFO mapred.JobClient: map 100% reduce 0% 08/09/10 15:13:11 INFO mapred.JobClient: map 100% reduce 13% 08/09/10 15:30:41 INFO mapred.JobClient: map 80% reduce 13% 08/09/10 15:30:41 INFO mapred.JobClient: Task Id : task_200809101511_0003_m_000000_0, Status : FAILED Too many fetch-failures 08/09/10 15:30:42 WARN mapred.JobClient: Error reading task outputhttp://localhost:50060/tasklog? plaintext=true&taskid=task_200809101511_0003_m_000000_0&filter=stdout 08/09/10 15:30:42 WARN mapred.JobClient: Error reading task outputhttp://localhost:50060/tasklog? plaintext=true&taskid=task_200809101511_0003_m_000000_0&filter=stderr 08/09/10 15:30:44 INFO mapred.JobClient: map 100% reduce 13% 08/09/10 15:30:49 INFO mapred.JobClient: map 100% reduce 20% 08/09/10 15:40:52 INFO mapred.JobClient: Task Id : task_200809101511_0003_m_000004_0, Status : FAILED Too many fetch-failures 08/09/10 15:40:52 WARN mapred.JobClient: Error reading task outputhttp://localhost:50060/tasklog? plaintext=true&taskid=task_200809101511_0003_m_000004_0&filter=stdout 08/09/10 15:40:52 WARN mapred.JobClient: Error reading task outputhttp://localhost:50060/tasklog? plaintext=true&taskid=task_200809101511_0003_m_000004_0&filter=stderr 08/09/10 15:41:03 INFO mapred.JobClient: map 100% reduce 26% it halts n finally got over after 43 min when i saw the tasktracker's log, i found 2008-09-11 13:09:14,067 INFO org.apache.hadoop.mapred.TaskTracker: task_200809111304_0002_m_000003_0 1.0% hdfs://master:54310/user/root/a1/f15:0+10348 2008-09-11 13:09:14,077 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809111304_0002_m_000003_0 is done. 2008-09-11 13:09:20,647 INFO org.apache.hadoop.mapred.TaskTracker: task_200809111304_0002_r_000000_0 0.20000002% reduce > copy (3 of 5 at 0.01 MB/s) > 2008-09-11 13:09:22,686 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(task_200809111304_0002_m_000004_0,0) failed : org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200809111304_0002/task_200809111304_0002_m_000004_0/output/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator $AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead (LocalDirAllocator.java:138) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet (TaskTracker.java:2315) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle (ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch (WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle (ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle (WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection (SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) 2008-09-11 13:09:22,686 WARN org.apache.hadoop.mapred.TaskTracker: Unknown child with bad map output: task_200809111304_0002_m_000004_0. Ignored. 2008-09-11 13:09:23,652 INFO org.apache.hadoop.mapred.TaskTracker: task_200809111304_0002_r_000000_0 0.26666668% reduce > copy (4 of 5 at 0.01 MB/s) > 2008-09-11 13:09:26,656 INFO org.apache.hadoop.mapred.TaskTracker: task_200809111304_0002_r_000000_0 0.26666668% reduce > copy (4 of 5 at 0.01 MB/s) > this was present so many times and finally 2008-09-11 13:22:04,694 WARN org.apache.hadoop.conf.Configuration: /root/Desktop/hadoop/hadoop-0.17.2.1/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200809111304_0002/job.xml:a attempt to override final parameter: dfs.replication; Ignoring. 2008-09-11 13:22:05,879 INFO org.apache.hadoop.mapred.TaskTracker: task_200809111304_0002_m_000004_1 1.0% hdfs://master:54310/user/root/a1/f10:0+2309 2008-09-11 13:22:05,888 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809111304_0002_m_000004_1 is done. 2008-09-11 13:22:09,445 INFO org.apache.hadoop.mapred.TaskTracker: task_200809111304_0002_r_000000_0 0.26666668% reduce > copy (4 of 5 at 0.01 MB/s) > 2008-09-11 13:22:10,522 INFO org.apache.hadoop.mapred.TaskTracker: task_200809111304_0002_r_000000_0 0.8744081% reduce > reduce 2008-09-11 13:22:10,526 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809111304_0002_r_000000_0 is done. 2008-09-11 13:22:15,296 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_200809111304_0002 2008-09-11 13:22:15,296 INFO org.apache.hadoop.mapred.TaskRunner: task_200809111304_0002_m_000003_0 done; removing files. 2008-09-11 13:22:15,299 INFO org.apache.hadoop.mapred.TaskRunner: task_200809111304_0002_r_000000_0 done; removing files. 2008-09-11 13:22:15,301 INFO org.apache.hadoop.mapred.TaskRunner: task_200809111304_0002_m_000002_0 done; removing files. SHUTDOWN_MSG: /************************************************ reducer task runs on master(redhat) the task_200809101511_0003_m_000004_0/ specified in the log was done in slave(ubuntu) in jobtracker's log, i found 2008-09-11 13:09:13,849 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809111304_0002_m_000002_0' has completed tip_200809111304_0002_m_000002 successfully. 2008-09-11 13:09:13,852 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200809111304_0002_r_000000_0' to tip tip_200809111304_0002_r_000000, for tracker 'tracker_master:localhost.localdomain/127.0.0.1:38315' 2008-09-11 13:09:14,696 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809111304_0002_m_000003_0' has completed tip_200809111304_0002_m_000003 successfully. 2008-09-11 13:09:14,996 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809111304_0002_m_000004_0' has completed tip_200809111304_0002_m_000004 successfully. 2008-09-11 13:11:59,832 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200809111304_0002_m_000004_0 2008-09-11 13:16:59,385 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200809111304_0002_m_000004_0 2008-09-11 13:22:04,659 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200809111304_0002_m_000004_0 2008-09-11 13:22:04,659 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200809111304_0002_m_000004_0 ... killing it 2008-09-11 13:22:04,659 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200809111304_0002_m_000004_0: Too many fetch-failures 2008-09-11 13:22:04,660 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task tip_200809111304_0002_m_000004 2008-09-11 13:22:04,660 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200809111304_0002_m_000004_1' to tip tip_200809111304_0002_m_000004, for tracker 'tracker_master:localhost.localdomain/127.0.0.1:38315' 2008-09-11 13:22:05,259 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809111304_0002_m_000004_0' from 'tracker_localhost:localhost/127.0.0.1:38957' 2008-09-11 13:22:06,496 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809111304_0002_m_000004_1' has completed tip_200809111304_0002_m_000004 successfully. 2008-09-11 13:22:11,228 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200809111304_0002_r_000000_0' to hdfs://master:54310/user/root/b2 2008-09-11 13:22:11,228 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809111304_0002_r_000000_0' has completed tip_200809111304_0002_r_000000 successfully. 2008-09-11 13:22:11,241 INFO org.apache.hadoop.mapred.JobInProgress: Job job_200809111304_0002 has completed successfully. 2008-09-11 13:22:14,466 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809111304_0002_m_000004_0' from 'tracker_localhost:localhost/127.0.0.1:38957' 2008-09-11 13:22:15,294 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809111304_0002_m_000000_0' from 'tracker_master:localhost.localdomain/127.0.0.1:38315' 2008-09-11 13:22:15,294 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809111304_0002_m_000001_0' from 'tracker_master:localhost.localdomain/127.0.0.1:38315' 2008-09-11 13:22:15,294 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809111304_0002_m_000002_0' from 'tracker_master:localhost.localdomain/127.0.0.1:38315' 2008-09-11 13:22:15,294 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809111304_0002_m_000003_0' from 'tracker_master:localhost.localdomain/127.0.0.1:38315' 2008-09-11 13:22:15,295 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809111304_0002_m_000004_1' from 'tracker_master:localhost.localdomain/127.0.0.1:38315' 2008-09-11 13:22:15,295 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809111304_0002_r_000000_0' from 'tracker_master:localhost.localdomain/127.0.0.1:38315' 2008-09-11 14:25:56,282 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid task_200809111304_0002_m_000004_0 hadoop-site.xml <configuration> <property> <name>fs.default.name</name> <value>hdfs://master:54310/</value> <final>true</final> </property> <property> <name>mapred.job.tracker</name> <value>master:54311</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>2</value> <final>true</final> </property> <property> <name>hadoop.tmp.dir</name> <value>absolute path</value> <final>true</final> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx512M</value> <final>true</final> </property> <property> <name>mapred.speculative.execution</name> <value>false</value> <final>true</final> </property> </configuration> i dont know where i went wrong .. kindly help me solving this -- Best Regards S.Chandravadana This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
