[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

Chris Douglas (JIRA) Wed, 31 Aug 2016 17:29:47 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453857#comment-15453857
 ]


Chris Douglas commented on HDFS-7878:
-------------------------------------

bq. One idea behind open(long/InodeId) is to be able to open files 
consistently; e.g. for partial caching (one needs to be sure that the cached 
data and the data read from FS are for the same file, guarding against 
overwrites).

HDFS-9806 needs this API for exactly that reason. We need to open paths 
consistently, and at least know when an external reference diverges from state 
cached in local storage.

bq. File ID is easy to propagate between different readers for this purpose, 
but it seems that FileStatus would be rather inconvenient. It forces the caller 
who is dealing with the FS to get the status by name first (which also only 
works if the name is known; in our case we do know the name) and verify that 
fileId is consistent.

FileStatus is more state to pass around, but does it change the calling 
pattern? Some reader needs to obtain the internal reference. v06 was attempting 
to split the difference between related use cases:

# Obtain a reference to the file, open by reference irrespective of changes in 
the hierarchical namespace (e.g., HDFS {{open("/.reserved/.inode/blahblah")}})
# Obtain a reference to the file, verify TOCTOU (e.g., list a directory, return 
a only if it's the same entity referenced in the first operation)
# Versioning (e.g., refuse to open a reference if the entity returned is less 
recent than the metadata version)

And so on. Would you prefer the {{open(InodeID)}} style API? Exposing the 
{{InodeId}} type to the user seemed unfortunate, but the difference in 
semantics is clearer to the caller.

> API - expose an unique file identifier
> --------------------------------------
>
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

Reply via email to