[
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382570#comment-14382570
]
Colin Patrick McCabe commented on HDFS-7878:
--------------------------------------------
I agree with [~jingzhao] that having a nullable field is better than having a
subclass. If we start going down the subclass route we will have 2^N classes
for each combination of nullable fields. Offhand, I can think of at least 5
fields which ought to be nullable (symlink, block_replication, blocksize,
access_time, fileId). Keep in mind that block_replication, blocksize, and
access_time are not relevant to directories, and the symlink field is only
relevant to symlinks. I think it's better to just have fileId always there as
a Long and set to null when it's not relevant. We can optimize the memory
consumption more later with things like bitfields, cramming things into longs,
etc.
I also think that in hadoop 3, we should make FileStatus use protobuf to
serialize itself so that we don't have these problems with adding new fields.
But I think there is already a JIRA for that.
> API - expose an unique file identifier
> --------------------------------------
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch,
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct
> when file is overwritten.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)