[
https://issues.apache.org/jira/browse/HADOOP-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676613#action_12676613
]
Ramya R commented on HADOOP-5324:
---------------------------------
Below is the last 4KB of the log file. With the below message reducer hangs
forever.
\\ \\
{noformat}
2009-02-25 08:56:36,584 INFO org.apache.hadoop.mapred.ReduceTask:
GetMapEventsThread exiting
2009-02-25 08:56:36,584 INFO org.apache.hadoop.mapred.ReduceTask:
getMapsEventsThread joined.
2009-02-25 08:56:36,585 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram
manager
2009-02-25 08:56:36,585 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved
on-disk merge complete: 12 files left.
2009-02-25 08:56:36,586 INFO org.apache.hadoop.mapred.ReduceTask: In-memory
merge complete: 15 files left.
2009-02-25 08:56:36,589 INFO org.apache.hadoop.mapred.ReduceTask: Keeping 15
segments, 66668024 bytes in memory for intermediate, on-disk merge
2009-02-25 08:56:36,597 INFO org.apache.hadoop.mapred.ReduceTask: Merging 12
files, 1072060708 bytes from disk
2009-02-25 08:56:36,599 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0
segments, 0 bytes from memory into reduce
2009-02-25 08:56:36,599 INFO org.apache.hadoop.mapred.Merger: Merging 27 sorted
segments
2009-02-25 08:56:49,155 INFO org.apache.hadoop.mapred.Merger: Merging 18
intermediate segments out of a total of 27
2009-02-25 08:59:44,318 INFO org.apache.hadoop.mapred.Merger: Down to the last
merge-pass, with 10 segments left of total size: 1072060626 bytes
2009-02-25 09:21:26,758 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream
ResponseProcessor exception for block
<blk_ID1>java.net.SocketTimeoutException: 69000 millis timeout while waiting
for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/<hostname1:port1>
remote=/<hostname1:port2>]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2333)
2009-02-25 09:21:26,759 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block <blk_ID1> bad datanode[0] <hostname1:port2>
2009-02-25 09:21:26,759 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block <blk_ID1> in pipeline <hostname1:port2>, <hostname2:port1>,
<hostname3:port1>: bad datanode <hostname1:port2>
2009-02-25 09:22:38,114 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.net.SocketTimeoutException: 69000 millis timeout
while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/<hostname1:port3>
remote=/<hostname1:port2>]
2009-02-25 09:22:38,260 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
<blk_ID2>
2009-02-25 09:22:38,263 INFO org.apache.hadoop.hdfs.DFSClient: Waiting to find
target node: <hostname1:port2>
2009-02-25 10:06:35,909 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream
ResponseProcessor exception for block
<blk_ID3>java.net.SocketTimeoutException: 69000 millis timeout while waiting
for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/<hostname1:port4>
remote=/<hostname1:port2>]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2333)
2009-02-25 10:06:43,847 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block <blk_ID3> bad datanode[0] <hostname1:port2>
2009-02-25 10:06:43,847 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery
for block <blk_ID3> in pipeline <hostname1:port2>, <hostname4:port1>,
<hostname5:port1>: bad datanode <hostname1:port2>
{noformat}
> Reduce step hangs while recovering a block from bad datanode
> ------------------------------------------------------------
>
> Key: HADOOP-5324
> URL: https://issues.apache.org/jira/browse/HADOOP-5324
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: Ramya R
> Fix For: 0.20.0
>
>
> The reduce step hangs infinitely when its trying to recover a block from a
> bad datanode. The node from which the block is being retrieved is alive and
> TT and DN are up and running.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.