[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353753#comment-14353753
 ] 

Colin Patrick McCabe commented on HDFS-7878:
--------------------------------------------

Thanks for tackling this!

One big concern here, though: I'm concerned that having a separate getFileId 
call will create a lot of TOCTOU (time-of-check, time-of-use) race conditions.  
What if the file is deleted and re-created in between calling getFileStatus and 
calling getFileId?  Then the client ends up caching the wrong file block 
locations (or other file data).  -1 until we can figure this out.  Sorry for 
the negativity.

Why can't we just put this into the file info somewhere?  I don't think the 
subclass approach is a bad one.  To avoid casting, we could also have an 
accessor in the superclass that returns 0 (or throws an exception) when the ID 
is not available.

> API - expose an unique file identifier
> --------------------------------------
>
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to