[
https://issues.apache.org/jira/browse/HDFS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918753#action_12918753
]
Matt Foley commented on HDFS-270:
---------------------------------
By the way, HDFS-854 was an effort to speed up startup time by parallelizing
disk scan, but it didn't change the particular disk scan code used during
startup. There are actually multiple full disk scans during system startup:
1. Very early in system initialization, in the DataStorage layer, if an Upgrade
has been requested, the entire data store will be scanned and replicated via
hardlinks. This is what really takes a long time.
2. Later on, during execution of the FSVolume constructors, the entire disk is
scanned to create the in-memory FSDir tree.
3. And almost immediately after all FSVolumes have been initialized, the
FSDataset constructor causes another full disk scan to create the in-memory
ReplicasMap.
The last two items seem very inefficient, but they actually execute in just 2-4
minutes total for a loaded 4-volume system -- vs 45 minutes for the Upgrade
snapshot on the same Datanode.
The code changed in HDFS-854 in the DirectoryScanner module, is used for a
periodic consistency check between the blocks on the disks, and the in-memory
ReplicasMap. The periodic BlockReport, however, is done from memory not from
disk, and the initial BlockReport at startup is gated by the FSDataset
initialization, not the DirectoryScanner.
> DFS Upgrade should process dfs.data.dirs in parallel
> ----------------------------------------------------
>
> Key: HDFS-270
> URL: https://issues.apache.org/jira/browse/HDFS-270
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 0.22.0
> Reporter: Stu Hood
> Assignee: Matt Foley
>
> I just upgraded from 0.14.2 to 0.15.0, and things went very smoothly, if a
> little slowly.
> The main reason the upgrade took so long was the block upgrades on the
> datanodes. Each of our datanodes has 3 drives listed for the dfs.data.dir
> parameter. From looking at the logs, it is fairly clear that the upgrade
> procedure does not attempt to upgrade all listed dfs.data.dir's in parallel.
> I think even if all of your dfs.data.dir's are on the same physical device,
> there would still be an advantage to performing the upgrade process in
> parallel. The less downtime, the better: especially if it is potentially 20
> minutes versus 60 minutes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.