[jira] Commented: (HADOOP-5324) Reduce step hangs while recovering a block from bad datanode

Ramya R (JIRA) Wed, 25 Feb 2009 03:28:29 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676613#action_12676613
 ]


Ramya R commented on HADOOP-5324:
---------------------------------

Below is the last 4KB of the log file. With the below message reducer hangs 
forever.
\\ \\ 
{noformat}
2009-02-25 08:56:36,584 INFO org.apache.hadoop.mapred.ReduceTask: 
GetMapEventsThread exiting
2009-02-25 08:56:36,584 INFO org.apache.hadoop.mapred.ReduceTask: 
getMapsEventsThread joined.
2009-02-25 08:56:36,585 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram 
manager
2009-02-25 08:56:36,585 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved 
on-disk merge complete: 12 files left.
2009-02-25 08:56:36,586 INFO org.apache.hadoop.mapred.ReduceTask: In-memory 
merge complete: 15 files left.
2009-02-25 08:56:36,589 INFO org.apache.hadoop.mapred.ReduceTask: Keeping 15 
segments, 66668024 bytes in memory for intermediate, on-disk merge
2009-02-25 08:56:36,597 INFO org.apache.hadoop.mapred.ReduceTask: Merging 12 
files, 1072060708 bytes from disk
2009-02-25 08:56:36,599 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 
segments, 0 bytes from memory into reduce
2009-02-25 08:56:36,599 INFO org.apache.hadoop.mapred.Merger: Merging 27 sorted 
segments
2009-02-25 08:56:49,155 INFO org.apache.hadoop.mapred.Merger: Merging 18 
intermediate segments out of a total of 27
2009-02-25 08:59:44,318 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 10 segments left of total size: 1072060626 bytes
2009-02-25 09:21:26,758 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream 
ResponseProcessor exception  for block 
<blk_ID1>java.net.SocketTimeoutException: 69000 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/<hostname1:port1> 
remote=/<hostname1:port2>]
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2333)

2009-02-25 09:21:26,759 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
for block <blk_ID1> bad datanode[0] <hostname1:port2>
2009-02-25 09:21:26,759 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
for block <blk_ID1> in pipeline <hostname1:port2>, <hostname2:port1>, 
<hostname3:port1>: bad datanode <hostname1:port2>
2009-02-25 09:22:38,114 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStream java.net.SocketTimeoutException: 69000 millis timeout 
while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/<hostname1:port3> 
remote=/<hostname1:port2>]
2009-02-25 09:22:38,260 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block 
<blk_ID2>
2009-02-25 09:22:38,263 INFO org.apache.hadoop.hdfs.DFSClient: Waiting to find 
target node: <hostname1:port2>
2009-02-25 10:06:35,909 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream 
ResponseProcessor exception  for block 
<blk_ID3>java.net.SocketTimeoutException: 69000 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/<hostname1:port4> 
remote=/<hostname1:port2>]
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2333)

2009-02-25 10:06:43,847 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
for block <blk_ID3> bad datanode[0] <hostname1:port2>
2009-02-25 10:06:43,847 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
for block <blk_ID3> in pipeline <hostname1:port2>, <hostname4:port1>, 
<hostname5:port1>: bad datanode <hostname1:port2>
{noformat}

> Reduce step hangs while recovering a block from bad datanode
> ------------------------------------------------------------
>
>                 Key: HADOOP-5324
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5324
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Ramya R
>             Fix For: 0.20.0
>
>
> The reduce step hangs infinitely when its trying to recover a block from a 
> bad datanode. The node from which the block is being retrieved is alive and 
> TT and DN are up and running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5324) Reduce step hangs while recovering a block from bad datanode

Reply via email to