[
https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989130#comment-14989130
]
Duo Zhang commented on HDFS-8782:
---------------------------------
I tried to make {{DataStorage.addStorageLocations}} run parallelly but I found
it is difficult.
There are some properties in {{DataStorage}}(inherit from {{StorageInfo}})
which will be updated when loading {{StorageDirectory}}, such as
{{layoutVersion}}, so it may have side effect when changing the code from
sequential to parallel even if I use lock everywhere to protect these
properties.
I do not get the point why we need a {{layoutVersion}} in {{DataStorage}}? As
far as I know, {{DataStorage}} is only a container of {{StorageDirectory}} or
{{BlockPoolSliceStorage}} if federation is enabled. So what does the
{{layoutVersion}} in {{DataStorage}} mean? Is there any history reason for
keeping it?
Thanks.
> Upgrade to block ID-based DN storage layout delays DN registration
> ------------------------------------------------------------------
>
> Key: HDFS-8782
> URL: https://issues.apache.org/jira/browse/HDFS-8782
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Haohui Mai
> Priority: Critical
>
> We have seen multiple incidents at production sites that there are long
> delays for DNs to register to the NN when upgrading to post 2.6 release.
> Further investigation shows that the DN is blocked when upgrading the storage
> layout introduced in HDFS-6482. The new storage layout requires making up to
> 64k directories in the underlying file system. Unfortunately the current
> implementation calls {{mkdirs()}} sequentially and upgrades each volume in
> sequential order.
> As a result, upgrading a DN with a lot of disks or with blocks that have
> random block ID takes a long time (usually in hours), and the DN won't
> register to the NN unless it finishes upgrading all the storage directory.
> The excessive delays confuse operations and break the assumption of rolling
> upgrades.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)