[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514987#comment-15514987 ]
Chris Douglas commented on HDFS-7878: ------------------------------------- bq. I'd highlight YARN app submission as a key use case; it currently uses timestamps and gets very confused if you overwrite something, even if its contents are unchanged YARN {{LocalResourceProto}} could replace some of its metadata with a {{FileStatusProto}} from HDFS-6984, if it were available. That should include this identifier. bq. The other way to expose it would be from a byte[] of version info An opaque {{byte[]}} is awkward with multiple args. A caller would need to pass a {{Map<Path,byte[]>}}, arrays of {{Path}} and {{byte[]}}, or a composite type (kind of redundant, given {{FileStatus}} exists). We could add a {{FileSystem#getHandle(FileStatus)}} API to return an opaque, serializable type that's the minimal set of bytes to refer to that entity. I suppose we could mandate {{BikeShed#getPath()}} so it's not too annoying, but as a subset, we're saving a handful of bytes over {{FileStatus}}. Particularly after HDFS-6984, if someone wants to store a few million of them efficiently, there are better methods for that. That said, I agree that {{FileStatusProto}} should represent the {{BikeShed}} as bytes. bq. Additionally, InodeId looks implementation-specific to me, which makes this API not useful to or be supported natively by other backend To be clear, this supports using {{FileStatus}} in {{FileSystem}} APIs, rather than {{InodeId}} e.g., {{FileSystem#open(InodeId)}}? Do you agree that we should use a type for {{BikeShed}}, not just a long? > API - expose an unique file identifier > -------------------------------------- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org