[
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364067#comment-14364067
]
Jing Zhao commented on HDFS-7878:
---------------------------------
bq. Colin wrote: if the client makes two different calls to getFileStatus...
since we're doing 2x the RPCs to the NameNode that we need to...
The current patch only makes one getFileStatus RPC.
bq. Colin wrote: Then FileStatus objects returned from HDFS (and any other
filesystem that has user-visible inode IDs) can return the inode ID
This FileStatus object returned from HDFS is called "HdfsFileStatus"...
bq. Colin wrote: We do 1/2 the RPCs of the current patch, put 1/2 the load on
the NN, and don't open up another race condition.
Again, the current patch only makes ONE single RPC, and your proposed approach
is exactly what the current patch is doing except we already have
HdfsFileStatus containing file ID information. Instead of changing or extending
a public and stable interface like FileStatus, the easiest way is to only keep
this API inside DistributedFileSystem and simply returning the file id
contained inside of HdfsFileStatus.
> API - expose an unique file identifier
> --------------------------------------
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct
> when file is overwritten.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)