[jira] [Commented] (HDFS-3493) Replication is not happened for the block (which is recovered and in finalized) to the Datanode which has got the same block with old generation timestamp in RBW

Andrew Wang (JIRA) Tue, 10 Jun 2014 17:17:16 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027252#comment-14027252
 ]


Andrew Wang commented on HDFS-3493:
-----------------------------------

Thanks for taking this on Juan, fix looks good. Just a few nitty comments:

* some whitespace only changes in BlockManager
* some lines longer than 80 chars
* Maybe fold {{minReplicationSatisfied &&}} into {{corruptedDuringWrite}} like 
how you assign {{hasMoreCorruptReplicas}} for parity.
* In the test, do we need that sleep(10000)? I'm always wary of sleeps, since 
they lead to test flakiness.
* I think the comment should also read something like "DNs will detect new 
dummy blocks on restart". Would also be good to drop a comment about what 
you're doing with creating dummy blocks.
* I like to put nice conservative timeouts on my tests, e.g. 
{{@Test(timeout=120000}}.

+1 pending these though. [~vinayrpet], maybe you'd like to take a look too?

> Replication is not happened for the block (which is recovered and in 
> finalized) to the Datanode which has got the same block with old generation 
> timestamp in RBW
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3493
>                 URL: https://issues.apache.org/jira/browse/HDFS-3493
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha, 2.0.5-alpha
>            Reporter: J.Andreina
>            Assignee: Juan Yu
>         Attachments: HDFS-3493.002.patch, HDFS-3493.003.patch, HDFS-3493.patch
>
>
> replication factor= 3, block report interval= 1min and start NN and 3DN
> Step 1:Write a file without close and do hflush (Dn1,DN2,DN3 has blk_ts1)
> Step 2:Stopped DN3
> Step 3:recovery happens and time stamp updated(blk_ts2)
> Step 4:close the file
> Step 5:blk_ts2 is finalized and available in DN1 and Dn2
> Step 6:now restarted DN3(which has got blk_ts1 in rbw)
> From the NN side there is no cmd issued to DN3 to delete the blk_ts1 . But 
> ask DN3 to make the block as corrupt .
> Replication of blk_ts2 to DN3 is not happened.
> NN logs:
> ========
> {noformat}
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block 
> blk_3927215081484173742_1008 from neededReplications as it has enough 
> replicas.
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not 
> able to place enough replicas, still in need of 1 to reach 1
> For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {noformat}
> fsck Report
> ===========
> {noformat}
> /file21:  Under replicated 
> BP-1008469586-XX.XX.XX.XX-1336829603103:blk_3927215081484173742_1008. Target 
> Replicas is 3 but found 2 replica(s).
> .Status: HEALTHY
>  Total size:  495 B
>  Total dirs:  1
>  Total files: 3
>  Total blocks (validated):    3 (avg. block size 165 B)
>  Minimally replicated blocks: 3 (100.0 %)
>  Over-replicated blocks:      0 (0.0 %)
>  Under-replicated blocks:     1 (33.333332 %)
>  Mis-replicated blocks:               0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   2.0
>  Corrupt blocks:              0
>  Missing replicas:            1 (14.285714 %)
>  Number of data-nodes:                3
>  Number of racks:             1
> FSCK ended at Sun May 13 09:49:05 IST 2012 in 9 milliseconds
> The filesystem under path '/' is HEALTHY
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-3493) Replication is not happened for the block (which is recovered and in finalized) to the Datanode which has got the same block with old generation timestamp in RBW

Reply via email to