[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512866#comment-15512866
 ] 

Steve Loughran commented on HDFS-7878:
--------------------------------------

I see: if I do getFileStatus, I get back enough info that I can open off that, 
guaranteed that the version of the file opened is from the status check, else 
I'd see an error. And this is done with a new field, so things like 
checksums/etags can be used off object stores, a version number if someone 
every hooked VAXFS up to Hadoop, etc

I also see why this implies that FileStatus needs to be serializable: I want to 
be able to submit something to a server and be confident that is what gets 
picked up. I'd highlight YARN app submission as a key use case; it currently 
uses timestamps and gets very confused if you overwrite something, even if its 
contents are unchanged.

The other way to expose it would be from a byte[] of version info, so anything 
can marshall file version info as {{(path, bytes[])}}; open(FileStatus status) 
would just be mapped to {{open(Path, status.versionInfo}}. I think that could 
be more flexible in terms of passing data around, especially as you could 
extend the protobuf in things like AM launch context, that is, in 
{{LocalResourceProto}}. 

Have you raised this with the YARN team? 

> API - expose an unique file identifier
> --------------------------------------
>
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to