[
https://issues.apache.org/jira/browse/HDFS-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880506#comment-16880506
]
Stephen O'Donnell commented on HDFS-8789:
-----------------------------------------
I've been looking into upgradeDomain (UD). To try to answer the question:
{quote}
What happens on NN restart (to be precise, during the replication queues
initialization)?
Each block will be checked against
{{BlockPlacementPolicy#verifyBlockPlacement()}} and will get added to the
replication queue. When calculating repl work, {{chooseTarget()}} is supposed
to help correct any violation. This happens when the network topology or the
placement policy changes. Does it also work for upgrade domain block placement
policy?
{quote}
What I have found is that if you start with a cluster with no UD, and then
enable UD, then on restart the NN does notice all the blocks violate the
placement policy and it adds them to the replication queue. However I believe
there are some issues in the logic used to correct the problems in that area of
the code
There are at least two issues I have come across:
# With UD enabled, but not racks configured the queued replication work never
makes any progress, as in blockManager.validateReconstructionWork(), it checks
to see if the new replica increases the number of racks, and if it does not, it
skips it and tries again later.
# In blockManager.scheduleReconstruction there is some logic that says if
`numReplicas.liveReplicas() >= requiredRedundancy` then we need only 1 new
replica. This would also be the case for rack redundancy (we always want 2
racks), but for UD, we may need 2 new replicas if all 3 existing are on the
same UD.
I will open a new Jira for this to see if we can get it fixed, but it may be
slightly trickier than it sounds with the current code structure.
Note that we also have HDFS-14053 committed since this Jira was opened, which
allows miss-replicated blocks to be processed via fsck on a path by path basis.
> Block Placement policy migrator
> -------------------------------
>
> Key: HDFS-8789
> URL: https://issues.apache.org/jira/browse/HDFS-8789
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Chris Trezzo
> Assignee: Chris Trezzo
> Priority: Major
> Attachments: HDFS-8789-trunk-STRAWMAN-v1.patch
>
>
> As we start to add new block placement policies to HDFS, it will be necessary
> to have a robust tool that can migrate HDFS blocks between placement
> policies. This jira is for the design and implementation of that tool.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]