[ 
https://issues.apache.org/jira/browse/HDFS-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027252#comment-14027252
 ] 

Andrew Wang commented on HDFS-3493:
-----------------------------------

Thanks for taking this on Juan, fix looks good. Just a few nitty comments:

* some whitespace only changes in BlockManager
* some lines longer than 80 chars
* Maybe fold {{minReplicationSatisfied &&}} into {{corruptedDuringWrite}} like 
how you assign {{hasMoreCorruptReplicas}} for parity.
* In the test, do we need that sleep(10000)? I'm always wary of sleeps, since 
they lead to test flakiness.
* I think the comment should also read something like "DNs will detect new 
dummy blocks on restart". Would also be good to drop a comment about what 
you're doing with creating dummy blocks.
* I like to put nice conservative timeouts on my tests, e.g. 
{{@Test(timeout=120000}}.

+1 pending these though. [~vinayrpet], maybe you'd like to take a look too?

> Replication is not happened for the block (which is recovered and in 
> finalized) to the Datanode which has got the same block with old generation 
> timestamp in RBW
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3493
>                 URL: https://issues.apache.org/jira/browse/HDFS-3493
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha, 2.0.5-alpha
>            Reporter: J.Andreina
>            Assignee: Juan Yu
>         Attachments: HDFS-3493.002.patch, HDFS-3493.003.patch, HDFS-3493.patch
>
>
> replication factor= 3, block report interval= 1min and start NN and 3DN
> Step 1:Write a file without close and do hflush (Dn1,DN2,DN3 has blk_ts1)
> Step 2:Stopped DN3
> Step 3:recovery happens and time stamp updated(blk_ts2)
> Step 4:close the file
> Step 5:blk_ts2 is finalized and available in DN1 and Dn2
> Step 6:now restarted DN3(which has got blk_ts1 in rbw)
> From the NN side there is no cmd issued to DN3 to delete the blk_ts1 . But 
> ask DN3 to make the block as corrupt .
> Replication of blk_ts2 to DN3 is not happened.
> NN logs:
> ========
> {noformat}
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block 
> blk_3927215081484173742_1008 from neededReplications as it has enough 
> replicas.
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not 
> able to place enough replicas, still in need of 1 to reach 1
> For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {noformat}
> fsck Report
> ===========
> {noformat}
> /file21:  Under replicated 
> BP-1008469586-XX.XX.XX.XX-1336829603103:blk_3927215081484173742_1008. Target 
> Replicas is 3 but found 2 replica(s).
> .Status: HEALTHY
>  Total size:  495 B
>  Total dirs:  1
>  Total files: 3
>  Total blocks (validated):    3 (avg. block size 165 B)
>  Minimally replicated blocks: 3 (100.0 %)
>  Over-replicated blocks:      0 (0.0 %)
>  Under-replicated blocks:     1 (33.333332 %)
>  Mis-replicated blocks:               0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   2.0
>  Corrupt blocks:              0
>  Missing replicas:            1 (14.285714 %)
>  Number of data-nodes:                3
>  Number of racks:             1
> FSCK ended at Sun May 13 09:49:05 IST 2012 in 9 milliseconds
> The filesystem under path '/' is HEALTHY
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to