[
https://issues.apache.org/jira/browse/HADOOP-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Chansler reassigned HADOOP-2763:
---------------------------------------
Assignee: Tsz Wo (Nicholas), SZE
> Replication Monitor timing out repeatedly
> -----------------------------------------
>
> Key: HADOOP-2763
> URL: https://issues.apache.org/jira/browse/HADOOP-2763
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.16.0
> Environment: Jan 28 nightly build
> With patches 2095, 2119, and 2723
> Reporter: Christian Kunz
> Assignee: Tsz Wo (Nicholas), SZE
>
> I upgraded a Hadoop installation to the Jan 28 nightly build.
> DFS contains 5+ M files.
> Fsck reported 1 hour after leaving safemode, 5274 under-replicated blocks
> with 25 single replications, 3 hours later 433 under-replicated with still 20
> single replications.
> The namenode log shows repeated timeouts of the replication monitor for the
> same blocks:
> 2008-02-01 03:41:24,184 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask datanode to replicate blk_2984271423661664080
> to datanode(s) datanode1 datanode2
> 2008-02-01 03:51:14,104 WARN org.apache.hadoop.fs.FSNamesystem:
> PendingReplicationMonitor timed out block blk_2984271423661664080
> 2008-02-01 03:51:22,303 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask datanode to replicate blk_2984271423661664080
> to datanode(s) datanode3 datanode4
> 2008-02-01 04:01:14,150 WARN org.apache.hadoop.fs.FSNamesystem:
> PendingReplicationMonitor timed out block blk_2984271423661664080
> 2008-02-01 04:01:19,344 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask datanode to replicate blk_2984271423661664080
> to datanode(s) datanode5 datanode6
> ...
> The datanode seems to be successfully transmitting the blocks:
> 2008-02-01 03:42:06,284 INFO org.apache.hadoop.dfs.DataNode: datanode
> Starting thread to transfer block blk_2984271423661664080 to datanode1,
> datannode2
> 2008-02-01 03:42:09,535 INFO org.apache.hadoop.dfs.DataNode:
> datanode:Transmitted block blk_2984271423661664080 to /datanode1
> 2008-02-01 03:52:06,238 INFO org.apache.hadoop.dfs.DataNode: datanode
> Starting thread to transfer block blk_2984271423661664080 to
> datanode3,datanode4
> 2008-02-01 03:52:09,470 INFO org.apache.hadoop.dfs.DataNode:
> datanode:Transmitted block blk_2984271423661664080 to /datanode3
> The destination datanodes seem to have problems receiving these blocks (some
> time later for a different attempt):
> 2008-02-01 06:43:06,541 INFO org.apache.hadoop.dfs.DataNode: Receiving block
> blk_2984271423661664080 from /datanode
> 2008-02-01 06:43:09,647 INFO org.apache.hadoop.dfs.DataNode: Exception in
> receiveBlock for block blk_2984271423661664080 java.net.SocketException:
> Connection reset
> 2008-02-01 06:43:09,647 INFO org.apache.hadoop.dfs.DataNode: writeBlock
> blk_2984271423661664080 received exception java.net.SocketException:
> Connection reset
> But I was successfully transferring the block between the two datanodes using
> scp.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.