[
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514987#comment-15514987
]
Chris Douglas commented on HDFS-7878:
-------------------------------------
bq. I'd highlight YARN app submission as a key use case; it currently uses
timestamps and gets very confused if you overwrite something, even if its
contents are unchanged
YARN {{LocalResourceProto}} could replace some of its metadata with a
{{FileStatusProto}} from HDFS-6984, if it were available. That should include
this identifier.
bq. The other way to expose it would be from a byte[] of version info
An opaque {{byte[]}} is awkward with multiple args. A caller would need to pass
a {{Map<Path,byte[]>}}, arrays of {{Path}} and {{byte[]}}, or a composite type
(kind of redundant, given {{FileStatus}} exists). We could add a
{{FileSystem#getHandle(FileStatus)}} API to return an opaque, serializable type
that's the minimal set of bytes to refer to that entity. I suppose we could
mandate {{BikeShed#getPath()}} so it's not too annoying, but as a subset, we're
saving a handful of bytes over {{FileStatus}}. Particularly after HDFS-6984, if
someone wants to store a few million of them efficiently, there are better
methods for that.
That said, I agree that {{FileStatusProto}} should represent the {{BikeShed}}
as bytes.
bq. Additionally, InodeId looks implementation-specific to me, which makes this
API not useful to or be supported natively by other backend
To be clear, this supports using {{FileStatus}} in {{FileSystem}} APIs, rather
than {{InodeId}} e.g., {{FileSystem#open(InodeId)}}? Do you agree that we
should use a type for {{BikeShed}}, not just a long?
> API - expose an unique file identifier
> --------------------------------------
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Labels: BB2015-05-TBR
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch,
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch,
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct
> when file is overwritten.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]