[ 
https://issues.apache.org/jira/browse/HADOOP-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574786#action_12574786
 ] 

Konstantin Shvachko commented on HADOOP-2423:
---------------------------------------------

This patch actually does 2 things. 
# It introduces a class that caches INodes and local names on the path.
# Optimizes the mkDirs() using this class.

I agree both tasks are important, but if you just want to optimize mkDirs() as 
stated above then it could be done much easier.
The INode.getExistingPathINodes() should be made package private rather than 
INode-private.
And then you can get all components of the path and all existing inodes along 
the path by
{code}
FSDirectory.mkDirs(src, ... ) {
    byte[][] components = INode.getPathComponents(src);
    INode[] inodes = new INode[components.length];
    rootDir.getExistingPathINodes(components, inodes);

    // create missing inodes
    ...............
}
{code}
After that you just find first null in the array of inodes, and start adding 
directories using addNode() and fill in the null elements one by one.
You don't even need to start from the rootDir for every component.

Introducing the caching class is important but will take more effort, because 
we would probably want to to convert all String-path 
parameters to it and use across all name-node parts. This is going to be a big 
change, and it will require proper testing,
benchmarking, etc.

IMO optimizing mkDirs is important by itself, since it is extensively used by 
our clients.

> The codes in FSDirectory.mkdirs(...) is inefficient.
> ----------------------------------------------------
>
>                 Key: HADOOP-2423
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2423
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 2423_20080130.patch, 2423_20080303.patch
>
>
> FSDirectory.mkdirs(...) creates List<String> v to store all dirs.  e.g.
> {code}
> //Suppose 
> src = "/foo/bar/bas/"
> //Then,
> v = {"/", "/foo", "/foo/bar", "/foo/bar/bas"}
> {code}
> For each directory string *cur* in v, no matter *cur* already exists or not, 
> it will try to do a unprotectedMkdir(cur, ...).  Then, *cur* is parsed to 
> byte[][] in INodeDirectory.addNode (...).
> We don't need to do the parsing for each string in v.  Instead, byte[][] 
> should be stored.  Also, the loop should not continue once it finds an 
> existing subdirectory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to