[
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hairong Kuang updated HDFS-946:
-------------------------------
Attachment: HDFSFileStatus.patch
This patch does the following:
1. create a new file status in HDFS, called HDFSFileStatus, for over-the-wire
transfer, in which the path contains only the local name of a path, not the
full path. Also the path is represented as is a byte array in Java UTF8
encoding just the same as the one stored in each inode.
2. change ClientProtocol getListing to return HDFSFileStatus[] and getFileInfo
to return HDFSFileStatus.
a. The path in the return value of getFileInfo is always an empty byte array.
b. If listStatus is called on a file, the path in the only HDFSFileStatus
returned also is an empty byte array;
c. If listStatus is called on a directory, the path in each HDFSFileStatus in
the returned array contains the local name of the directory entry.
3. FileSystem#getFileStatus and FileSystem#listStatus still see FileStatus
which contains the full path name.
4. Unit tests are added to TestFileStatus to verify the change.
> NameNode should not return full path name when lisitng a diretory or getting
> the status of a file
> -------------------------------------------------------------------------------------------------
>
> Key: HDFS-946
> URL: https://issues.apache.org/jira/browse/HDFS-946
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Hairong Kuang
> Fix For: 0.22.0
>
> Attachments: HDFSFileStatus.patch, HDFSFileStatus.patch
>
>
> FSDirectory#getListring(String src) has the following code:
> int i = 0;
> for (INode cur : contents) {
> listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
> i++;
> }
> So listing a directory will return an array of FileStatus. Each FileStatus
> element has the full path name. This increases the return message size and
> adds non-negligible CPU time to the operation.
> FSDirectory#getFileInfo(String) does not need to return the file name either.
> Another optimization is that in the version of FileStatus that's used in the
> wire protocol, the field path does not need to be Path; It could be a String
> or a byte array ideally. This could avoid unnecessary creation of the Path
> objects at NameNode, thus help reduce the GC problem observed when a large
> number of getFileInfo or getListing operations hit NameNode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.