Got a similar error when doing a mapreduce job on the master machine.
Mapping job is ok and in the end there are the right results in my
output folder, but the reduce hangs at 17% a very long time. Found this
in one of the task logs a view times:
...
2008-06-18 17:31:02,297 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-06-18 17:31:02,297 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 Got 0 known map output location(s);
scheduling...
2008-06-18 17:31:02,297 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 Scheduled 0 of 0 known outputs (0 slow hosts
and 0 dup hosts)
2008-06-18 17:31:03,276 WARN org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 copy failed:
task_200806181716_0001_m_000001_0 from koeln
2008-06-18 17:31:03,276 WARN org.apache.hadoop.mapred.ReduceTask:
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.Socket.connect(Socket.java:519)
at sun.net.NetworkClient.doConnect(NetworkClient.java:152)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
at sun.net.www.http.HttpClient.New(HttpClient.java:306)
at sun.net.www.http.HttpClient.New(HttpClient.java:323)
at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729)
at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
at
org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:139)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:815)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:764)
2008-06-18 17:31:03,276 INFO org.apache.hadoop.mapred.ReduceTask: Task
task_200806181716_0001_r_000000_0: Failed fetch #7 from
task_200806181716_0001_m_000001_0
2008-06-18 17:31:03,276 INFO org.apache.hadoop.mapred.ReduceTask: Failed to
fetch map-output from task_200806181716_0001_m_000001_0 even after
MAX_FETCH_RETRIES_PER_MAP retries... reporting to the JobTracker
2008-06-18 17:31:03,276 WARN org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 adding host koeln to penalty box, next
contact in 150 seconds
2008-06-18 17:31:03,277 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 Need 1 map output(s)
2008-06-18 17:31:03,317 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 1 map-outputs from previous failures
2008-06-18 17:31:03,317 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 Got 1 known map output location(s);
scheduling...
2008-06-18 17:31:03,317 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 Scheduled 0 of 1 known outputs (1 slow hosts
and 0 dup hosts)
2008-06-18 17:31:08,336 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 Need 1 map output(s)
2008-06-18 17:31:08,337 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2008-06-18 17:31:08,337 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 Got 1 known map output location(s);
scheduling...
2008-06-18 17:31:08,337 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 Scheduled 0 of 1 known outputs (1 slow hosts
and 0 dup hosts)
2008-06-18 17:31:13,356 INFO org.apache.hadoop.mapred.ReduceTask:
task_200806181716_0001_r_000000_0 Need 1 map output(s)
...
Did i forget to open some ports? i opened 50010 for datanode and the
ports for dfs and jobtracker as specified in hadoop-site.xml.
If its a firewall problem, woudnt hadoop recognize that at startup, i.e.
that connections would be refused?
On Wed, 2008-06-18 at 11:32 +0200, Alexander Arimond wrote:
> Thank you, first tried the put from the master machine, which leads to
> the error. The put from the slave machine works. Guess youre right with
> the configuration parameters. Appears a bit strange to me, because the
> firewall settings and the hadoop-site.xml on both machines are equal.
>
> On Tue, 2008-06-17 at 14:08 -0700, Konstantin Shvachko wrote:
> > Looks like the client machine from which you call -put cannot connect to
> > the data-nodes.
> > It could be firewall or wrong configuration parameters that you use for the
> > client.
> >
> > Alexander Arimond wrote:
> > > hi,
> > >
> > > i'm new in hadoop and im just testing it at the moment.
> > > i set up a cluster with 2 nodes and it seems like they are running
> > > normally,
> > > the log files of the namenode and the datanodes dont show errors.
> > > Firewall should be set right.
> > > but when i try to upload a file to the dfs i get following message:
> > >
> > > [EMAIL PROTECTED]:~/hadoop$ bin/hadoop dfs -put file.txt file.txt
> > > 08/06/12 14:44:19 INFO dfs.DFSClient: Exception in
> > > createBlockOutputStream java.net.ConnectException: Connection refused
> > > 08/06/12 14:44:19 INFO dfs.DFSClient: Abandoning block
> > > blk_5837981856060447217
> > > 08/06/12 14:44:28 INFO dfs.DFSClient: Exception in
> > > createBlockOutputStream java.net.ConnectException: Connection refused
> > > 08/06/12 14:44:28 INFO dfs.DFSClient: Abandoning block
> > > blk_2573458924311304120
> > > 08/06/12 14:44:37 INFO dfs.DFSClient: Exception in
> > > createBlockOutputStream java.net.ConnectException: Connection refused
> > > 08/06/12 14:44:37 INFO dfs.DFSClient: Abandoning block
> > > blk_1207459436305221119
> > > 08/06/12 14:44:46 INFO dfs.DFSClient: Exception in
> > > createBlockOutputStream java.net.ConnectException: Connection refused
> > > 08/06/12 14:44:46 INFO dfs.DFSClient: Abandoning block
> > > blk_-8263828216969765661
> > > 08/06/12 14:44:52 WARN dfs.DFSClient: DataStreamer Exception:
> > > java.io.IOException: Unable to create new block.
> > > 08/06/12 14:44:52 WARN dfs.DFSClient: Error Recovery for block
> > > blk_-8263828216969765661 bad datanode[0]
> > >
> > >
> > > dont know what that means and didnt found something about that..
> > > Hope somebody can help with that.
> > >
> > > Thank you!
> > >
> > >
> >