[ https://issues.apache.org/jira/browse/HDFS-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829622#comment-13829622 ]
Binglin Chang commented on HDFS-5540: ------------------------------------- Read the [log|https://builds.apache.org/job/PreCommit-HDFS-Build/5504//testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlocksWithNotEnoughRacks/testCorruptBlockRereplicatedAcrossRacks/] {code} 2013-11-20 17:29:58,638 INFO DataNode.clienttrace (BlockSender.java:sendBlock(734)) - src: /127.0.0.1:45980, dest: /127.0.0.1:47450, bytes: 516, op: HDFS_READ, cliID: DFSClient_NONMAPREDUCE_1758168951_1, offset: 0, srvID: DS-655102145-67.195.138.24-45980-1384968592304, blockid: BP-62019746-67.195.138.24-1384968591859:blk_1073741825_1001, duration: 244566 Waiting for 1 corrupt replicas 2013-11-20 17:29:58,660 INFO BlockStateChange (CorruptReplicasMap.java:addToCorruptReplicasMap(88)) - BLOCK NameSystem.addToCorruptReplicasMap: blk_1073741825 added as corrupt on 127.0.0.1:41043 by localhost/127.0.0.1 because client machine reported it 2013-11-20 17:29:59,346 INFO datanode.DataNode (DataXceiver.java:writeBlock(594)) - Received BP-62019746-67.195.138.24-1384968591859:blk_1073741825_1001 src: /127.0.0.1:39752 dest: /127.0.0.1:49340 of size 512 2013-11-20 17:29:59,347 INFO BlockStateChange (BlockManager.java:logAddStoredBlock(2275)) - BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:49340 is added to blk_1073741825_1001 size 512 2013-11-20 17:29:59,347 INFO BlockStateChange (BlockManager.java:invalidateBlock(1092)) - BLOCK* invalidateBlock: blk_1073741825_1001(same as stored) on 127.0.0.1:41043 2013-11-20 17:29:59,640 INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7373)) - allowed=true ugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/testFile dst=null perm=null Waiting for 1 corrupt replicas {code} >From the log we can see that, DFSTestUtil.waitCorruptReplicas check corrupt >repls every 1 second, but hdfs found and recover the block just within just a >second, so DFSTestUtil.waitCorruptReplicas have never detected the corrupt >block, causing timeout. > Fix TestBlocksWithNotEnoughRacks > -------------------------------- > > Key: HDFS-5540 > URL: https://issues.apache.org/jira/browse/HDFS-5540 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Binglin Chang > Assignee: Binglin Chang > > TestBlocksWithNotEnoughRacks fails with timed out waiting for corrupt replicas > java.util.concurrent.TimeoutException: Timed out waiting for corrupt > replicas. Waiting for 1, but only found 0 > at > org.apache.hadoop.hdfs.DFSTestUtil.waitCorruptReplicas(DFSTestUtil.java:351) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks.testCorruptBlockRereplicatedAcrossRacks(TestBlocksWithNotEnoughRacks.java:219) -- This message was sent by Atlassian JIRA (v6.1#6144)