Steve Loughran commented on HDFS-7878:

I see: if I do getFileStatus, I get back enough info that I can open off that, 
guaranteed that the version of the file opened is from the status check, else 
I'd see an error. And this is done with a new field, so things like 
checksums/etags can be used off object stores, a version number if someone 
every hooked VAXFS up to Hadoop, etc

I also see why this implies that FileStatus needs to be serializable: I want to 
be able to submit something to a server and be confident that is what gets 
picked up. I'd highlight YARN app submission as a key use case; it currently 
uses timestamps and gets very confused if you overwrite something, even if its 
contents are unchanged.

The other way to expose it would be from a byte[] of version info, so anything 
can marshall file version info as {{(path, bytes[])}}; open(FileStatus status) 
would just be mapped to {{open(Path, status.versionInfo}}. I think that could 
be more flexible in terms of passing data around, especially as you could 
extend the protobuf in things like AM launch context, that is, in 

Have you raised this with the YARN team? 

> API - expose an unique file identifier
> --------------------------------------
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.patch
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to