Re: hadoop 0.15.3 r612257 freezes on reduce task

Florian Leibert Tue, 29 Jan 2008 00:03:21 -0800

i just saw that the job finally completed. however it took one hourand 45 minutes - for a small job that runs in about 1-2 minutes on asingle node (outside hadoop framework). The reduce part tookextremely long - the output of the tasktracker shows about 5 of thesample sections for each of the copies (11 copies). so clearlysomething is limiting the reduce...


Any clues?


Thanks

Florian


Jobtracker log:
...

2008-01-28 19:08:01,981 INFO org.apache.hadoop.mapred.JobInProgress:Failed fetch notification #1 for task task_200801281756_0002_m_000007_02008-01-28 19:16:00,848 INFO org.apache.hadoop.mapred.JobInProgress:Failed fetch notification #2 for task task_200801281756_0001_m_000007_02008-01-28 19:32:33,485 INFO org.apache.hadoop.mapred.JobInProgress:Failed fetch notification #2 for task task_200801281756_0002_m_000008_02008-01-28 19:42:12,822 INFO org.apache.hadoop.mapred.JobInProgress:Failed fetch notification #3 for task task_200801281756_0001_m_000007_02008-01-28 19:42:12,822 INFO org.apache.hadoop.mapred.JobInProgress:Too many fetch-failures for output of task:task_200801281756_0001_m_000007_0 ... killing it2008-01-28 19:42:12,823 INFO org.apache.hadoop.mapred.TaskInProgress:Error from task_200801281756_0001_m_000007_0: Too many fetch-failures2008-01-28 19:42:12,823 INFO org.apache.hadoop.mapred.JobInProgress:Choosing normal task tip_200801281756_0001_m_0000072008-01-28 19:42:12,823 INFO org.apache.hadoop.mapred.JobTracker:Adding task 'task_200801281756_0001_m_000007_1' to tiptip_200801281756_0001_m_000007, for tracker 'tracker_localhost:/127.0.0.1:43625'2008-01-28 19:42:14,103 INFO org.apache.hadoop.mapred.JobTracker:Removed completed task 'task_200801281756_0001_m_000007_0' from'tracker_localhost.localdomain:/127.0.0.1:37469'2008-01-28 19:42:14,538 INFO org.apache.hadoop.mapred.JobInProgress:Task 'task_200801281756_0001_m_000007_1' has completedtip_200801281756_0001_m_000007 successfully.2008-01-28 19:51:44,625 INFO org.apache.hadoop.mapred.JobInProgress:Failed fetch notification #3 for task task_200801281756_0001_m_000006_02008-01-28 19:51:44,625 INFO org.apache.hadoop.mapred.JobInProgress:Too many fetch-failures for output of task:task_200801281756_0001_m_000006_0 ... killing it2008-01-28 19:51:44,625 INFO org.apache.hadoop.mapred.TaskInProgress:Error from task_200801281756_0001_m_000006_0: Too many fetch-failures2008-01-28 19:51:44,625 INFO org.apache.hadoop.mapred.JobInProgress:Choosing normal task tip_200801281756_0001_m_0000062008-01-28 19:51:44,626 INFO org.apache.hadoop.mapred.JobTracker:Adding task 'task_200801281756_0001_m_000006_1' to tiptip_200801281756_0001_m_000006, for tracker 'tracker_localhost:/127.0.0.1:43625'2008-01-28 19:51:44,671 INFO org.apache.hadoop.mapred.JobTracker:Removed completed task 'task_200801281756_0001_m_000006_0' from'tracker_localhost.localdomain:/127.0.0.1:37469'2008-01-28 19:51:46,503 INFO org.apache.hadoop.mapred.JobInProgress:Task 'task_200801281756_0001_m_000006_1' has completedtip_200801281756_0001_m_000006 successfully.2008-01-28 19:51:51,768 INFO org.apache.hadoop.mapred.TaskRunner:Saved output of task 'task_200801281756_0001_r_000000_0' to hdfs://humanufactory.com:9000/output2008-01-28 19:51:51,768 INFO org.apache.hadoop.mapred.JobInProgress:Task 'task_200801281756_0001_r_000000_0' has completedtip_200801281756_0001_r_000000 successfully.2008-01-28 19:51:51,779 INFO org.apache.hadoop.mapred.JobInProgress:Job job_200801281756_0001 has completed successfully.2008-01-28 19:51:54,680 INFO org.apache.hadoop.mapred.JobTracker:Removed completed task 'task_200801281756_0001_m_000006_0' from'tracker_localhost.localdomain:/127.0.0.1:37469'2008-01-28 19:51:54,680 INFO org.apache.hadoop.mapred.JobTracker:Removed completed task 'task_200801281756_0001_m_000007_0' from'tracker_localhost.localdomain:/127.0.0.1:37469'2008-01-28 19:51:56,674 INFO org.apache.hadoop.mapred.JobTracker:Removed completed task 'task_200801281756_0001_m_000002_0' from'tracker_localhost:/127.0.0.1:43625'2008-01-28 19:51:56,674 INFO org.apache.hadoop.mapred.JobTracker:Removed completed task 'task_200801281756_0001_m_000004_0' from'tracker_localhost:/127.0.0.1:43625'

...


Tasktracker log:

2008-01-28 19:07:02,844 INFO org.apache.hadoop.mapred.TaskTracker:task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of 11at 0.00 MB/s) >2008-01-28 19:07:08,330 WARN org.apache.hadoop.mapred.TaskTracker:getMapOutput(task_200801281756_0001_m_000006_0,0) failed :org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not findtask_200801281756_0001_m_000006_0/file.out.index in any of theconfigured local directoriesat org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1967)

        at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)

        at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)

at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)

        at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
        at org.mortbay.http.HttpServer.service(HttpServer.java:954)
        at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
        at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
        at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)

at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)

        at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
        at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

2008-01-28 19:07:08,330 WARN org.apache.hadoop.mapred.TaskTracker:Unknown child with bad map output: task_200801281756_0001_m_000006_0.Ignored.2008-01-28 19:07:08,864 INFO org.apache.hadoop.mapred.TaskTracker:task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of 11at 0.00 MB/s) >2008-01-28 19:07:14,884 INFO org.apache.hadoop.mapred.TaskTracker:task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of 11at 0.00 MB/s) >2008-01-28 19:07:17,894 INFO org.apache.hadoop.mapred.TaskTracker:task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of 11at 0.00 MB/s) >2008-01-28 19:07:23,914 INFO org.apache.hadoop.mapred.TaskTracker:task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of 11at 0.00 MB/s) >2008-01-28 19:07:29,934 INFO org.apache.hadoop.mapred.TaskTracker:task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of 11at 0.00 MB/s) >2008-01-28 19:07:32,944 INFO org.apache.hadoop.mapred.TaskTracker:task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of 11at 0.00 MB/s) >2008-01-28 19:07:38,964 INFO org.apache.hadoop.mapred.TaskTracker:task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of 11at 0.00 MB/s) >2008-01-28 19:07:44,983 INFO org.apache.hadoop.mapred.TaskTracker:task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of 11at 0.00 MB/s) >2008-01-28 19:07:47,994 INFO org.apache.hadoop.mapred.TaskTracker:task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of 11at 0.00 MB/s) >










On Jan 28, 2008, at 8:43 PM, Amar Kamat wrote:

Hi,
Can you plz attach the Jobtracker and any of the stuck reducer'slog? Is there any way I can reproduce this error?
Amar
Florian Leibert wrote:
hi,
i setup a cluster with 3 slaves and 1 master. when running thewordcount example, hadoop execution halts after the mapping phase.It has started the reduce phase and generally gets to 10-20% buthalts there. I left it running for 30 minutes, no change. Logfiles don't show any errors...I reformatted the Hadoop FS that i created, access to the filesfrom all nodes seems fine.
Has anyone seen a similar behavior?

Thanks,
Florian

Re: hadoop 0.15.3 r612257 freezes on reduce task

Reply via email to