[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

Colin Patrick McCabe (JIRA) Thu, 26 Mar 2015 13:15:08 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382570#comment-14382570
 ]


Colin Patrick McCabe commented on HDFS-7878:
--------------------------------------------

I agree with [~jingzhao] that having a nullable field is better than having a 
subclass.  If we start going down the subclass route we will have 2^N classes 
for each combination of nullable fields.  Offhand, I can think of at least 5 
fields which ought to be nullable (symlink, block_replication, blocksize, 
access_time, fileId).  Keep in mind that block_replication, blocksize, and 
access_time are not relevant to directories, and the symlink field is only 
relevant to symlinks.  I think it's better to just have fileId always there as 
a Long and set to null when it's not relevant.  We can optimize the memory 
consumption more later with things like bitfields, cramming things into longs, 
etc.

I also think that in hadoop 3, we should make FileStatus use protobuf to 
serialize itself so that we don't have these problems with adding new fields.  
But I think there is already a JIRA for that.

> API - expose an unique file identifier
> --------------------------------------
>
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

Reply via email to