[
https://issues.apache.org/jira/browse/HDFS-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028879#comment-13028879
]
Hairong Kuang commented on HDFS-78:
-----------------------------------
In our HDFS cluster NameNode becomes very slow when large number of files (in
millions) are open for writes concurrently. Profiling NN shows the most
expensive operations in the file create/close code path is
INode#getPathComponents, which is called when traversing the file path. As
Konstantin pointed out, startFile/getAdditionalBlock/completeFile unnecessarily
traverses the path multiple times. Fixing this should greatly improve
NameNode's performance.
> Eliminate redundant searches in the namespace directory tree.
> -------------------------------------------------------------
>
> Key: HDFS-78
> URL: https://issues.apache.org/jira/browse/HDFS-78
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Konstantin Shvachko
>
> There is no need to look for the same INode multiple times in the same
> name-node operation.
> For example in FSNamesystem.exists()
> {code}
> public boolean exists(String src) {
> if (dir.getFileBlocks(src) != null || dir.isDir(src)) {
> return true;
> } else {
> return false;
> }
> }
> {code}
> both getFileBlocks() and isDir() call rootDir.getNode(src) inside, which
> causes two separate lookups in the directory tree while one is enough.
> Why not check whether the inode is a directory as well as that it has blocks
> at the same time.
> Other methods do the same thing.
> - completeFile() calls getINode in different parts at least 3 times.
> - getAdditionalBlock() - 2 getINode calls
> - startFile() - I counted 5 calls, may be missed some.
> In order to prevent that we should define all methods beyond the top level
> based on INode parameters rather than path names.
> E.g. all FSDirectory methods should take INode as a parameter, not the String.
> We should be careful though not to use INode across separate synchronized
> sections.
> Once the lock is released the INode should be accessed by the path again.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira