[
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353753#comment-14353753
]
Colin Patrick McCabe commented on HDFS-7878:
--------------------------------------------
Thanks for tackling this!
One big concern here, though: I'm concerned that having a separate getFileId
call will create a lot of TOCTOU (time-of-check, time-of-use) race conditions.
What if the file is deleted and re-created in between calling getFileStatus and
calling getFileId? Then the client ends up caching the wrong file block
locations (or other file data). -1 until we can figure this out. Sorry for
the negativity.
Why can't we just put this into the file info somewhere? I don't think the
subclass approach is a bad one. To avoid casting, we could also have an
accessor in the superclass that returns 0 (or throws an exception) when the ID
is not available.
> API - expose an unique file identifier
> --------------------------------------
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct
> when file is overwritten.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)