[ https://issues.apache.org/jira/browse/HDFS-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027252#comment-14027252 ]
Andrew Wang commented on HDFS-3493: ----------------------------------- Thanks for taking this on Juan, fix looks good. Just a few nitty comments: * some whitespace only changes in BlockManager * some lines longer than 80 chars * Maybe fold {{minReplicationSatisfied &&}} into {{corruptedDuringWrite}} like how you assign {{hasMoreCorruptReplicas}} for parity. * In the test, do we need that sleep(10000)? I'm always wary of sleeps, since they lead to test flakiness. * I think the comment should also read something like "DNs will detect new dummy blocks on restart". Would also be good to drop a comment about what you're doing with creating dummy blocks. * I like to put nice conservative timeouts on my tests, e.g. {{@Test(timeout=120000}}. +1 pending these though. [~vinayrpet], maybe you'd like to take a look too? > Replication is not happened for the block (which is recovered and in > finalized) to the Datanode which has got the same block with old generation > timestamp in RBW > ----------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-3493 > URL: https://issues.apache.org/jira/browse/HDFS-3493 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.0-alpha, 2.0.5-alpha > Reporter: J.Andreina > Assignee: Juan Yu > Attachments: HDFS-3493.002.patch, HDFS-3493.003.patch, HDFS-3493.patch > > > replication factor= 3, block report interval= 1min and start NN and 3DN > Step 1:Write a file without close and do hflush (Dn1,DN2,DN3 has blk_ts1) > Step 2:Stopped DN3 > Step 3:recovery happens and time stamp updated(blk_ts2) > Step 4:close the file > Step 5:blk_ts2 is finalized and available in DN1 and Dn2 > Step 6:now restarted DN3(which has got blk_ts1 in rbw) > From the NN side there is no cmd issued to DN3 to delete the blk_ts1 . But > ask DN3 to make the block as corrupt . > Replication of blk_ts2 to DN3 is not happened. > NN logs: > ======== > {noformat} > INFO org.apache.hadoop.hdfs.StateChange: BLOCK > NameSystem.addToCorruptReplicasMap: duplicate requested for > blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by > /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match > COMPLETE block's genstamp in block map 1008 > INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from > DatanodeRegistration(XX.XX.XX.XX, > storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, > ipcPort=50277, > storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0), > blocks: 2, processing time: 1 msecs > INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block > blk_3927215081484173742_1008 from neededReplications as it has enough > replicas. > INFO org.apache.hadoop.hdfs.StateChange: BLOCK > NameSystem.addToCorruptReplicasMap: duplicate requested for > blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by > /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match > COMPLETE block's genstamp in block map 1008 > INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from > DatanodeRegistration(XX.XX.XX.XX, > storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, > ipcPort=50277, > storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0), > blocks: 2, processing time: 1 msecs > WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not > able to place enough replicas, still in need of 1 to reach 1 > For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {noformat} > fsck Report > =========== > {noformat} > /file21: Under replicated > BP-1008469586-XX.XX.XX.XX-1336829603103:blk_3927215081484173742_1008. Target > Replicas is 3 but found 2 replica(s). > .Status: HEALTHY > Total size: 495 B > Total dirs: 1 > Total files: 3 > Total blocks (validated): 3 (avg. block size 165 B) > Minimally replicated blocks: 3 (100.0 %) > Over-replicated blocks: 0 (0.0 %) > Under-replicated blocks: 1 (33.333332 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 1 > Average block replication: 2.0 > Corrupt blocks: 0 > Missing replicas: 1 (14.285714 %) > Number of data-nodes: 3 > Number of racks: 1 > FSCK ended at Sun May 13 09:49:05 IST 2012 in 9 milliseconds > The filesystem under path '/' is HEALTHY > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)