[
https://issues.apache.org/jira/browse/HDFS-14637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911529#comment-16911529
]
Wei-Chiu Chuang commented on HDFS-14637:
----------------------------------------
Excellent work! I had a quick glance at the patch and I think I now understand
the gist of it.
Will try to get another look at this patch later today.
> Namenode may not replicate blocks to meet the policy after enabling
> upgradeDomain
> ---------------------------------------------------------------------------------
>
> Key: HDFS-14637
> URL: https://issues.apache.org/jira/browse/HDFS-14637
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.3.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Attachments: HDFS-14637.001.patch, HDFS-14637.002.patch,
> HDFS-14637.003.patch, HDFS-14637.004.patch, HDFS-14637.005.patch
>
>
> After changing the network topology or placement policy on a cluster and
> restarting the namenode, the namenode will scan all blocks on the cluster at
> startup, and check if they meet the current placement policy. If they do not,
> they are added to the replication queue and the namenode will arrange for
> them to be replicated to ensure the placement policy is used.
> If you start with a cluster with no UpgradeDomain, and then enable
> UpgradeDomain, then on restart the NN does notice all the blocks violate the
> placement policy and it adds them to the replication queue. I believe there
> are some issues in the logic that prevents the blocks from replicating
> depending on the setup:
> With UD enabled, but no racks configured, and possible on a 2 rack cluster,
> the queued replication work never makes any progress, as in
> blockManager.validateReconstructionWork(), it checks to see if the new
> replica increases the number of racks, and if it does not, it skips it and
> tries again later.
> {code:java}
> DatanodeStorageInfo[] targets = rw.getTargets();
> if ((numReplicas.liveReplicas() >= requiredRedundancy) &&
> (!isPlacementPolicySatisfied(block)) ) {
> if (!isInNewRack(rw.getSrcNodes(), targets[0].getDatanodeDescriptor())) {
> // No use continuing, unless a new rack in this case
> return false;
> }
> // mark that the reconstruction work is to replicate internal block to a
> // new rack.
> rw.setNotEnoughRack();
> }
> {code}
> Additionally, in blockManager.scheduleReconstruction() is there some logic
> that sets the number of new replicas required to one, if the live replicas >=
> requiredReduncancy:
> {code:java}
> int additionalReplRequired;
> if (numReplicas.liveReplicas() < requiredRedundancy) {
> additionalReplRequired = requiredRedundancy - numReplicas.liveReplicas()
> - pendingNum;
> } else {
> additionalReplRequired = 1; // Needed on a new rack
> }{code}
> With UD, it is possible for 2 new replicas to be needed to meet the block
> placement policy, if all existing replicas are on nodes with the same domain.
> For traditional '2 rack redundancy', only 1 new replica would ever have been
> needed in this scenario.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]