Failed Task

Ross Boucher Mon, 24 Sep 2007 15:35:24 -0700

I'm having my job fail on me just before finishing, and afterprocessing for a few hours. All the relevant log info I can find isbelow, and the only thing I can see as potentially being a problem isthat the available space on one of the devices is pretty low.Additionally, 2 map jobs failed because of malformed input, but Iwould think that hadoop would just ignore those two jobs when tryingto complete the reduce phase. Does anyone recognize anything inthese logs that would help me identify the problem?


I had a task fail on me with the following message:

07/09/24 15:12:15 INFO mapred.JobClient: Task Id :task_0001_m_004154_1, Status : FAILED07/09/24 15:12:20 INFO mapred.JobClient: Task Id :task_0001_m_010939_1, Status : FAILED07/09/24 15:12:24 INFO mapred.JobClient: Task Id :task_0001_m_004154_2, Status : FAILED07/09/24 15:12:31 INFO mapred.JobClient: Task Id :task_0001_m_010939_2, Status : FAILED

07/09/24 15:12:35 INFO mapred.JobClient:  map 100% reduce 100%
java.io.IOException: Job failed!

at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)

        at SearchCount.main(SearchCount.java:168)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:585)

at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)

        at SearchDriver.main(SearchDriver.java:34)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

The logs for all of the tasktrackers end in

2007-09-24 15:13:34,205 WARN org.apache.hadoop.ipc.Server: IPC Serverhandler 0 on 50050, call getMapCompletionEvents(job_0001, 17180, 50)from 17.231.8.30:63506: output error

java.nio.channels.ClosedChannelException

at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:125)at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294)atorg.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108)at org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89)at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:592)

with only the Server handler number differing. Finally, the logsfrom the jobtracker app for the failed reduce jobs show:

2007-09-24 15:12:36,401 WARN org.apache.hadoop.mapred.ReduceTask:task_0001_r_000000_0 copy failed: task_0001_m_017073_0 from 17.231.8.332007-09-24 15:12:36,403 WARN org.apache.hadoop.mapred.ReduceTask:java.io.IOException: Server returned HTTP response code: 500 for URL:http://17.231.8.33:50060/mapOutput?map=task_0001_m_017073_0&reduce=0at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1152)at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:206)at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:680)at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:641)


Though they actually continue a few lines after that:

2007-09-24 15:12:37,393 INFO org.apache.hadoop.mapred.ReduceTask:task_0001_r_000000_0 Got 0 new map outputs from tasktracker and 0 mapoutputs from previous failures2007-09-24 15:12:37,393 INFO org.apache.hadoop.mapred.ReduceTask:task_0001_r_000000_0 Got 88 known map output location(s); scheduling...2007-09-24 15:12:37,394 INFO org.apache.hadoop.mapred.ReduceTask:task_0001_r_000000_0 Scheduled 2 of 88 known outputs (0 slow hostsand 86 dup hosts)


Thanks.

Ross Boucher
[EMAIL PROTECTED]

Failed Task

Reply via email to