[ 
https://issues.apache.org/jira/browse/HDFS-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880506#comment-16880506
 ] 

Stephen O'Donnell commented on HDFS-8789:
-----------------------------------------

I've been looking into upgradeDomain (UD). To try to answer the question:

{quote}

What happens on NN restart (to be precise, during the replication queues 
initialization)? 
 Each block will be checked against 
{{BlockPlacementPolicy#verifyBlockPlacement()}} and will get added to the 
replication queue. When calculating repl work, {{chooseTarget()}} is supposed 
to help correct any violation. This happens when the network topology or the 
placement policy changes. Does it also work for upgrade domain block placement 
policy?

{quote}

What I have found is that if you start with a cluster with no UD, and then 
enable UD, then on restart the NN does notice all the blocks violate the 
placement policy and it adds them to the replication queue. However I believe 
there are some issues in the logic used to correct the problems in that area of 
the code

There are at least two issues I have come across:
 # With UD enabled, but not racks configured the queued replication work never 
makes any progress, as in blockManager.validateReconstructionWork(), it checks 
to see if the new replica increases the number of racks, and if it does not, it 
skips it and tries again later.
 # In blockManager.scheduleReconstruction there is some logic that says if 
`numReplicas.liveReplicas() >= requiredRedundancy` then we need only 1 new 
replica. This would also be the case for rack redundancy (we always want 2 
racks), but for UD, we may need 2 new replicas if all 3 existing are on the 
same UD.

I will open a new Jira for this to see if we can get it fixed, but it may be 
slightly trickier than it sounds with the current code structure.

Note that we also have HDFS-14053 committed since this Jira was opened, which 
allows miss-replicated blocks to be processed via fsck on a path by path basis.

> Block Placement policy migrator
> -------------------------------
>
>                 Key: HDFS-8789
>                 URL: https://issues.apache.org/jira/browse/HDFS-8789
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>            Priority: Major
>         Attachments: HDFS-8789-trunk-STRAWMAN-v1.patch
>
>
> As we start to add new block placement policies to HDFS, it will be necessary 
> to have a robust tool that can migrate HDFS blocks between placement 
> policies. This jira is for the design and implementation of that tool.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to