[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284580#comment-14284580
 ] 

Lei (Eddy) Xu commented on HDFS-6673:
-------------------------------------

Hi, [~wheat9]

Thank you very much to pointing this out. In your patch, you have dumped inodes 
to LevelDB sorted by its parent ID. I have tried this method, but in my 
experiments, the time to dumping inodes and scan leveldb _sequentially_ 
overweights the benefits of sequential scanning.

In the current patch, I assume that one directory was sequentially written to 
fsimage. Thus when in the second run to scan INode section to generate text 
output, the parent directory INode is actually relatively stable cached in the 
LRU cache, as 

{code}
    @Override
    public String getParentPath(long inode) throws IOException {
      if (inode == INodeId.ROOT_INODE_ID) {
        return "/";
      }
      byte[] bytes = dirChildMap.get(toBytes(inode));
      Preconditions.checkState(bytes != null && bytes.length == 8,
          "Can not find parent directory for inode %s, "
              + "fsimage might be corrupted", inode);
      long parent = toLong(bytes);
      if (!dirPathCache.containsKey(parent)) {
        bytes = dirMap.get(toBytes(parent));
        if (parent != INodeId.ROOT_INODE_ID) {
          Preconditions.checkState(bytes != null,
              "Can not find parent directory for inode %s, "
                  + ", the fsimage might be corrupted.", parent);
        }
        String parentName = toString(bytes);
        String parentPath =
            new File(getParentPath(parent), parentName).toString();
        dirPathCache.put(parent, parentPath);
      }
      return dirPathCache.get(parent);
    }
{code}

Thus, even it is not a completely sequential scan on directory ID, it only 
involves one seek per INode. 

Does it make sense to you?

> Add Delimited format supports for PB OIV tool
> ---------------------------------------------
>
>                 Key: HDFS-6673
>                 URL: https://issues.apache.org/jira/browse/HDFS-6673
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 2.4.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>            Priority: Minor
>         Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
> HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, 
> HDFS-6673.005.patch
>
>
> The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
> features supported in the old {{oiv}} tool. 
> This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to