[jira] Created: (HDFS-1617) CLONE to COMMON - Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

Matt Foley (JIRA) Wed, 09 Feb 2011 10:38:27 -0800

CLONE to COMMON - Batch the calls in DataStorage to FileUtil.createHardLink(), 
so we call it once per directory instead of once per file
----------------------------------------------------------------------------------------------------------------------------------------


                 Key: HDFS-1617
                 URL: https://issues.apache.org/jira/browse/HDFS-1617
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: data-node
    Affects Versions: 0.20.2
            Reporter: Matt Foley
            Assignee: Matt Foley
             Fix For: 0.22.0


It was a bit of a puzzle why we can do a full scan of a disk in about 30 
seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes to 
do Upgrade replication via hardlinks.  It turns out that the 
org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to 
Runtime.getRuntime().exec(), to utilize native filesystem hardlink capability.  
So it is forking a full-weight external process, and we call it on each 
individual file to be replicated.

As a simple check on the possible cost of this approach, I built a Perl test 
script (under Linux on a production-class datanode).  Perl also uses a compiled 
and optimized p-code engine, and it has both native support for hardlinks and 
the ability to do "exec".  
-  A simple script to create 256,000 files in a directory tree organized like 
the Datanode, took 10 seconds to run.
-  Replicating that directory tree using hardlinks, the same way as the 
Datanode, took 12 seconds using native hardlink support.
-  The same replication using outcalls to exec, one per file, took 256 seconds!
-  Batching the calls, and doing 'exec' once per directory instead of once per 
file, took 16 seconds.

Obviously, your mileage will vary based on the number of blocks per volume.  A 
volume with less than about 4000 blocks will have only 65 directories.  A 
volume with more than 4K and less than about 250K blocks will have 4200 
directories (more or less).  And there are two files per block (the data file 
and the .meta file).  So the average number of files per directory may vary 
from 2:1 to 500:1.  A node with 50K blocks and four volumes will have 25K files 
per volume, or an average of about 6:1.  So this change may be expected to take 
it down from, say, 12 minutes per volume to 2.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (HDFS-1617) CLONE to COMMON - Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

Reply via email to