We came across an issue where our jobs failed to report back to the tracker. (https://issues.apache.org/jira/browse/HADOOP-1790) Now we are getting a little bit further and the map-phase is working just fine but the reduce seems to be just stuck at 0%. We are see the following in the logs:

2007-08-28 06:38:58,587 INFO org.apache.hadoop.mapred.TaskTracker: task_200708271639_0001_r_000000_0 0.0% reduce > copy > 2007-08-28 06:39:00,827 INFO org.apache.hadoop.mapred.TaskTracker: task_200708271639_0005_r_000000_0 0.0% reduce > copy > 2007-08-28 06:39:03,637 INFO org.apache.hadoop.mapred.TaskTracker: task_200708271639_0001_r_000000_0 0.0% reduce > copy > 2007-08-28 06:39:05,877 INFO org.apache.hadoop.mapred.TaskTracker: task_200708271639_0005_r_000000_0 0.0% reduce > copy >


2007-08-27 17:22:33,399 INFO org.apache.hadoop.mapred.ReduceTask: task_200708271639_0001_r_000000_0 Need 40 map output(s) 2007-08-27 17:22:33,400 INFO org.apache.hadoop.mapred.ReduceTask: task_200708271639_0001_r_000000_0 Got 0 new map outputs from tasktracker and 0 map outputs from previous failures 2007-08-27 17:22:33,400 INFO org.apache.hadoop.mapred.ReduceTask: task_200708271639_0001_r_000000_0 Got 42 known map output location (s); scheduling... 2007-08-27 17:22:33,400 INFO org.apache.hadoop.mapred.ReduceTask: task_200708271639_0001_r_000000_0 Scheduled 1 of 42 known outputs (24 slow hosts and 17 dup hosts) 2007-08-27 17:22:33,400 INFO org.apache.hadoop.mapred.ReduceTask: task_200708271639_0001_r_000000_0 Copying task_200708271639_0001_m_000001_0 output from host.domain.com. 2007-08-27 17:22:33,410 WARN org.apache.hadoop.mapred.ReduceTask: task_200708271639_0001_r_000000_0 copy failed: task_200708271639_0001_m_000001_0 from host.domain.com 2007-08-27 17:22:33,410 WARN org.apache.hadoop.mapred.ReduceTask: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress (PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.Socket.connect(Socket.java:516)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:152)

Slow hosts? Dup Hosts? Why connection refused? Any suggestions?

We are on 0.14 now

cheers
--
Torsten

Reply via email to