[
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512866#comment-15512866
]
Steve Loughran commented on HDFS-7878:
--------------------------------------
I see: if I do getFileStatus, I get back enough info that I can open off that,
guaranteed that the version of the file opened is from the status check, else
I'd see an error. And this is done with a new field, so things like
checksums/etags can be used off object stores, a version number if someone
every hooked VAXFS up to Hadoop, etc
I also see why this implies that FileStatus needs to be serializable: I want to
be able to submit something to a server and be confident that is what gets
picked up. I'd highlight YARN app submission as a key use case; it currently
uses timestamps and gets very confused if you overwrite something, even if its
contents are unchanged.
The other way to expose it would be from a byte[] of version info, so anything
can marshall file version info as {{(path, bytes[])}}; open(FileStatus status)
would just be mapped to {{open(Path, status.versionInfo}}. I think that could
be more flexible in terms of passing data around, especially as you could
extend the protobuf in things like AM launch context, that is, in
{{LocalResourceProto}}.
Have you raised this with the YARN team?
> API - expose an unique file identifier
> --------------------------------------
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Labels: BB2015-05-TBR
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch,
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch,
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct
> when file is overwritten.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]