[
https://issues.apache.org/jira/browse/HADOOP-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dhruba borthakur updated HADOOP-713:
------------------------------------
Attachment: optimizeComputeContentLen.patch
A DfsPath object for a directory used to have the size of all files inside that
directory summed up in it. This means that the
INodeDirectory.computeContentsLength had to recursively traval all the nodes in
that specified subtree and compute the size of the directory. This uses up a
lot of CPU on the namenode.
The fix propsed here is that the namenode returns a size of 0 for directories.
The client computes the size of a directory by recursively traversing all nodes
in the subtree.
> dfs list operation is too expensive
> -----------------------------------
>
> Key: HADOOP-713
> URL: https://issues.apache.org/jira/browse/HADOOP-713
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.8.0
> Reporter: Hairong Kuang
> Assignee: dhruba borthakur
> Priority: Blocker
> Fix For: 0.15.1
>
> Attachments: optimizeComputeContentLen.patch
>
>
> A list request to dfs returns an array of DFSFileInfo. A DFSFileInfo of a
> directory contains a field called contentsLen, indicating its size which
> gets computed at the namenode side by resursively going through its subdirs.
> At the same time, the whole dfs directory tree is locked.
> The list operation is used a lot by DFSClient for listing a directory,
> getting a file's size and # of replicas, and getting the size of dfs. Only
> the last operation needs the field contentsLen to be computed.
> To reduce its cost, we can add a flag to the list request. ContentsLen is
> computed If the flag is set. By default, the flag is false.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.