Improve Datanode startup time
-----------------------------

                 Key: HDFS-1443
                 URL: https://issues.apache.org/jira/browse/HDFS-1443
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: data-node
    Affects Versions: 0.20.2
            Reporter: Matt Foley
            Assignee: Matt Foley
             Fix For: 0.22.0


One of the factors slowing down cluster restart is the startup time for the 
Datanodes.  In particular, if Upgrade is needed, the Datanodes must do a 
Snapshot and this can take 5-15 minutes per volume, serially.  Thus, for a 
4-disk datanode, it may be 45 minutes before it is ready to send its initial 
Block Report to the Namenode.  This is an umbrella bug for the following four 
pieces of work to improve Datanode startup time:

1. Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it 
once per directory instead of once per file.  This is the biggest villain, 
responsible for 90% of that 45 minute delay.  See subordinate bug for details.

2. Refactor Upgrade process in DataStorage to run volume-parallel.  There is 
already a bug open for this, HDFS-270, and the volume-parallel work in 
DirectoryScanner from HDFS-854 is a good foundation to build on.

3. Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they 
share data and run volume-parallel.  Currently the two constructors for 
in-memory directory tree and replicas map run THREE full scans of the entire 
disk - once in FSDir(), once in recoverTempUnlinkedBlock(), and once in 
addToReplicasMap().  During each scan, a new File object is created for each of 
the 100,000 or so items in the native file system (for a 50,000-block node).  
This impacts GC as well as disk traffic.

4. Make getGenerationStampFromFile() more efficient.  Currently this routine is 
called by addToReplicasMap() for every blockfile in the directory tree, and it 
does a full listing of each file's containing directory on every call.  This is 
the equivalent of doing lots MORE full disk scans.  The underlying disk i/o 
buffers probably prevent disk thrashing, but we are still creating bazillions 
of unnecessary File objects that need to be GC'ed.  There is a simple 
refactoring that prevents this.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to