Chris Douglas commented on HDFS-7878:

bq. I'd highlight YARN app submission as a key use case; it currently uses 
timestamps and gets very confused if you overwrite something, even if its 
contents are unchanged
YARN {{LocalResourceProto}} could replace some of its metadata with a 
{{FileStatusProto}} from HDFS-6984, if it were available. That should include 
this identifier.

bq. The other way to expose it would be from a byte[] of version info
An opaque {{byte[]}} is awkward with multiple args. A caller would need to pass 
a {{Map<Path,byte[]>}}, arrays of {{Path}} and {{byte[]}}, or a composite type 
(kind of redundant, given {{FileStatus}} exists). We could add a 
{{FileSystem#getHandle(FileStatus)}} API to return an opaque, serializable type 
that's the minimal set of bytes to refer to that entity. I suppose we could 
mandate {{BikeShed#getPath()}} so it's not too annoying, but as a subset, we're 
saving a handful of bytes over {{FileStatus}}. Particularly after HDFS-6984, if 
someone wants to store a few million of them efficiently, there are better 
methods for that.

That said, I agree that {{FileStatusProto}} should represent the {{BikeShed}} 
as bytes.

bq. Additionally, InodeId looks implementation-specific to me, which makes this 
API not useful to or be supported natively by other backend

To be clear, this supports using {{FileStatus}} in {{FileSystem}} APIs, rather 
than {{InodeId}} e.g., {{FileSystem#open(InodeId)}}? Do you agree that we 
should use a type for {{BikeShed}}, not just a long?

> API - expose an unique file identifier
> --------------------------------------
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.patch
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to