We came across an issue where our jobs failed to report back to the
tracker. (https://issues.apache.org/jira/browse/HADOOP-1790) Now we
are getting a little bit further and the map-phase is working just
fine but the reduce seems to be just stuck at 0%. We are see the
following in the logs:
2007-08-28 06:38:58,587 INFO org.apache.hadoop.mapred.TaskTracker:
task_200708271639_0001_r_000000_0 0.0% reduce > copy >
2007-08-28 06:39:00,827 INFO org.apache.hadoop.mapred.TaskTracker:
task_200708271639_0005_r_000000_0 0.0% reduce > copy >
2007-08-28 06:39:03,637 INFO org.apache.hadoop.mapred.TaskTracker:
task_200708271639_0001_r_000000_0 0.0% reduce > copy >
2007-08-28 06:39:05,877 INFO org.apache.hadoop.mapred.TaskTracker:
task_200708271639_0005_r_000000_0 0.0% reduce > copy >
2007-08-27 17:22:33,399 INFO org.apache.hadoop.mapred.ReduceTask:
task_200708271639_0001_r_000000_0 Need 40 map output(s)
2007-08-27 17:22:33,400 INFO org.apache.hadoop.mapred.ReduceTask:
task_200708271639_0001_r_000000_0 Got 0 new map outputs from
tasktracker and 0 map outputs from previous failures
2007-08-27 17:22:33,400 INFO org.apache.hadoop.mapred.ReduceTask:
task_200708271639_0001_r_000000_0 Got 42 known map output location
(s); scheduling...
2007-08-27 17:22:33,400 INFO org.apache.hadoop.mapred.ReduceTask:
task_200708271639_0001_r_000000_0 Scheduled 1 of 42 known outputs (24
slow hosts and 17 dup hosts)
2007-08-27 17:22:33,400 INFO org.apache.hadoop.mapred.ReduceTask:
task_200708271639_0001_r_000000_0 Copying
task_200708271639_0001_m_000001_0 output from host.domain.com.
2007-08-27 17:22:33,410 WARN org.apache.hadoop.mapred.ReduceTask:
task_200708271639_0001_r_000000_0 copy failed:
task_200708271639_0001_m_000001_0 from host.domain.com
2007-08-27 17:22:33,410 WARN org.apache.hadoop.mapred.ReduceTask:
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress
(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.Socket.connect(Socket.java:516)
at sun.net.NetworkClient.doConnect(NetworkClient.java:152)
Slow hosts? Dup Hosts? Why connection refused? Any suggestions?
We are on 0.14 now
cheers
--
Torsten