[
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364389#comment-14364389
]
Colin Patrick McCabe commented on HDFS-7878:
--------------------------------------------
Hi [~jingzhao],
I understand that the current patch only makes one RPC. My concern is that in
order to get both the file ID and the other information about the file, two
RPCs will be needed when using this API. It's just a bad API from an
efficiency and concurrency point of view. It would be like if you needed 5
separate RPCs to get the length, access time, modification time, group, and
user. There is a reason why FileStatus combines together all these fields into
one class, and that same logic should lead us to including a way of getting
file ID out of FileStatus.
I understand that HdfsFileStatus already has {{HdfsFileStatus#getFileId}}. But
this functionality should not be HDFS-specific. There are other filesystems
that have file IDs. (Plus, if casting to HdfsFileStatus was adequate, there
would be no need for this JIRA at all, since this cast is already possible.)
Why not just add an accessor function to {{FileStatus}} with a default
implementation like I suggested earlier? Adding a new function to a stable
interface is a compatible change (and I note, that adding a new function to
FileSystem is also adding a new function to a stable class).
> API - expose an unique file identifier
> --------------------------------------
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct
> when file is overwritten.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)