Improve Datanode startup time
-----------------------------
Key: HDFS-1443
URL: https://issues.apache.org/jira/browse/HDFS-1443
Project: Hadoop HDFS
Issue Type: Improvement
Components: data-node
Affects Versions: 0.20.2
Reporter: Matt Foley
Assignee: Matt Foley
Fix For: 0.22.0
One of the factors slowing down cluster restart is the startup time for the
Datanodes. In particular, if Upgrade is needed, the Datanodes must do a
Snapshot and this can take 5-15 minutes per volume, serially. Thus, for a
4-disk datanode, it may be 45 minutes before it is ready to send its initial
Block Report to the Namenode. This is an umbrella bug for the following four
pieces of work to improve Datanode startup time:
1. Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it
once per directory instead of once per file. This is the biggest villain,
responsible for 90% of that 45 minute delay. See subordinate bug for details.
2. Refactor Upgrade process in DataStorage to run volume-parallel. There is
already a bug open for this, HDFS-270, and the volume-parallel work in
DirectoryScanner from HDFS-854 is a good foundation to build on.
3. Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they
share data and run volume-parallel. Currently the two constructors for
in-memory directory tree and replicas map run THREE full scans of the entire
disk - once in FSDir(), once in recoverTempUnlinkedBlock(), and once in
addToReplicasMap(). During each scan, a new File object is created for each of
the 100,000 or so items in the native file system (for a 50,000-block node).
This impacts GC as well as disk traffic.
4. Make getGenerationStampFromFile() more efficient. Currently this routine is
called by addToReplicasMap() for every blockfile in the directory tree, and it
does a full listing of each file's containing directory on every call. This is
the equivalent of doing lots MORE full disk scans. The underlying disk i/o
buffers probably prevent disk thrashing, but we are still creating bazillions
of unnecessary File objects that need to be GC'ed. There is a simple
refactoring that prevents this.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.