[ 
https://issues.apache.org/jira/browse/HDFS-14637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911529#comment-16911529
 ] 

Wei-Chiu Chuang commented on HDFS-14637:
----------------------------------------

Excellent work! I had a quick glance at the patch and I think I now understand 
the gist of it.

Will try to get another look at this patch later today.

> Namenode may not replicate blocks to meet the policy after enabling 
> upgradeDomain
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-14637
>                 URL: https://issues.apache.org/jira/browse/HDFS-14637
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-14637.001.patch, HDFS-14637.002.patch, 
> HDFS-14637.003.patch, HDFS-14637.004.patch, HDFS-14637.005.patch
>
>
> After changing the network topology or placement policy on a cluster and 
> restarting the namenode, the namenode will scan all blocks on the cluster at 
> startup, and check if they meet the current placement policy. If they do not, 
> they are added to the replication queue and the namenode will arrange for 
> them to be replicated to ensure the placement policy is used.
> If you start with a cluster with no UpgradeDomain, and then enable 
> UpgradeDomain, then on restart the NN does notice all the blocks violate the 
> placement policy and it adds them to the replication queue. I believe there 
> are some issues in the logic that prevents the blocks from replicating 
> depending on the setup:
> With UD enabled, but no racks configured, and possible on a 2 rack cluster, 
> the queued replication work never makes any progress, as in 
> blockManager.validateReconstructionWork(), it checks to see if the new 
> replica increases the number of racks, and if it does not, it skips it and 
> tries again later.
> {code:java}
> DatanodeStorageInfo[] targets = rw.getTargets();
> if ((numReplicas.liveReplicas() >= requiredRedundancy) &&
>     (!isPlacementPolicySatisfied(block)) ) {
>   if (!isInNewRack(rw.getSrcNodes(), targets[0].getDatanodeDescriptor())) {
>     // No use continuing, unless a new rack in this case
>     return false;
>   }
>   // mark that the reconstruction work is to replicate internal block to a
>   // new rack.
>   rw.setNotEnoughRack();
> }
> {code}
> Additionally, in blockManager.scheduleReconstruction() is there some logic 
> that sets the number of new replicas required to one, if the live replicas >= 
> requiredReduncancy:
> {code:java}
> int additionalReplRequired;
> if (numReplicas.liveReplicas() < requiredRedundancy) {
>   additionalReplRequired = requiredRedundancy - numReplicas.liveReplicas()
>       - pendingNum;
> } else {
>   additionalReplRequired = 1; // Needed on a new rack
> }{code}
> With UD, it is possible for 2 new replicas to be needed to meet the block 
> placement policy, if all existing replicas are on nodes with the same domain. 
> For traditional '2 rack redundancy', only 1 new replica would ever have been 
> needed in this scenario.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to