[
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284580#comment-14284580
]
Lei (Eddy) Xu commented on HDFS-6673:
-------------------------------------
Hi, [~wheat9]
Thank you very much to pointing this out. In your patch, you have dumped inodes
to LevelDB sorted by its parent ID. I have tried this method, but in my
experiments, the time to dumping inodes and scan leveldb _sequentially_
overweights the benefits of sequential scanning.
In the current patch, I assume that one directory was sequentially written to
fsimage. Thus when in the second run to scan INode section to generate text
output, the parent directory INode is actually relatively stable cached in the
LRU cache, as
{code}
@Override
public String getParentPath(long inode) throws IOException {
if (inode == INodeId.ROOT_INODE_ID) {
return "/";
}
byte[] bytes = dirChildMap.get(toBytes(inode));
Preconditions.checkState(bytes != null && bytes.length == 8,
"Can not find parent directory for inode %s, "
+ "fsimage might be corrupted", inode);
long parent = toLong(bytes);
if (!dirPathCache.containsKey(parent)) {
bytes = dirMap.get(toBytes(parent));
if (parent != INodeId.ROOT_INODE_ID) {
Preconditions.checkState(bytes != null,
"Can not find parent directory for inode %s, "
+ ", the fsimage might be corrupted.", parent);
}
String parentName = toString(bytes);
String parentPath =
new File(getParentPath(parent), parentName).toString();
dirPathCache.put(parent, parentPath);
}
return dirPathCache.get(parent);
}
{code}
Thus, even it is not a completely sequential scan on directory ID, it only
involves one seek per INode.
Does it make sense to you?
> Add Delimited format supports for PB OIV tool
> ---------------------------------------------
>
> Key: HDFS-6673
> URL: https://issues.apache.org/jira/browse/HDFS-6673
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: 2.4.0
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Priority: Minor
> Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch,
> HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch,
> HDFS-6673.005.patch
>
>
> The new oiv tool, which is designed for Protobuf fsimage, lacks a few
> features supported in the old {{oiv}} tool.
> This task adds supports of _Delimited_ processor to the oiv tool.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)