[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

Colin Patrick McCabe (JIRA) Mon, 16 Mar 2015 18:38:14 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364389#comment-14364389
 ]


Colin Patrick McCabe commented on HDFS-7878:
--------------------------------------------

Hi [~jingzhao],

I understand that the current patch only makes one RPC.  My concern is that in 
order to get both the file ID and the other information about the file, two 
RPCs will be needed when using this API.  It's just a bad API from an 
efficiency and concurrency point of view.  It would be like if you needed 5 
separate RPCs to get the length, access time, modification time, group, and 
user.  There is a reason why FileStatus combines together all these fields into 
one class, and that same logic should lead us to including a way of getting 
file ID out of FileStatus.

I understand that HdfsFileStatus already has {{HdfsFileStatus#getFileId}}.  But 
this functionality should not be HDFS-specific.  There are other filesystems 
that have file IDs.  (Plus, if casting to HdfsFileStatus was adequate, there 
would be no need for this JIRA at all, since this cast is already possible.)

Why not just add an accessor function to {{FileStatus}} with a default 
implementation like I suggested earlier?  Adding a new function to a stable 
interface is a compatible change (and I note, that adding a new function to 
FileSystem is also adding a new function to a stable class).

> API - expose an unique file identifier
> --------------------------------------
>
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

Reply via email to