Chris Douglas commented on HDFS-7878:

bq. is this somehow going to change end user APIs so a path isn't enough to 
refer to things?

No, I'll try to summarize.

This JIRA proposes an API for {{FileSystem}} that exposes HDFS open-by-inode. 
Not every implementation can enforce HDFS semantics precisely, but many can 
support an "open with verification" API that improves on the TOCTOU races 
common to most applications. These would mostly be new APIs.

While v05 proposed a {{long}} as the fileId, implementations other than HDFS 
use different metadata (mostly strings; matching the {{FileStatus}} metadata is 
often possible). If the {{FileHandle}} were exposed as a type, then 
implementations could (opaquely) embed metadata in {{FileStatus}} for 
"consistent" operations. v06 added a {{InodeId}} type (renamed {{FileHandle}} 
or {{PathHandle}} or {{BikeShed}}).

These metadata can be encoded as a new field in {{FileStatus}}, which would be 
API compatible, but serialized instances would not work across major versions. 
While this seems like a reasonable jump to make in 3.x, it could cause some 
pain. This aspect is discussed in HDFS-6984.

In this JIRA, we're discussing user-facing APIs for consistently opening a 
file, a use case [~sershe] needs in Hive and HDFS-9806 needs for correctness. 
As [~cmccabe] pointed out, we also want to consider consistent handling of 
directories and symlinks as we define this API.

In favor of augmenting {{FileStatus}}: it's simple, it's probably what most 
users expect, and it's serializable. It would be a natural, unsurprising API 
for create/rename/delete/listFileStatus. That said, it's also significantly 
larger than an 8 byte fileId, but implementations can detect crossed streams 
(i.e., requesting a fileId from the wrong {{FileSystem}}; {{ViewFS}} can use it 
for demux).

In favor of {{open(FileHandle)}}, we could add 
{{FileSystem#createHandle(FileStatus)}} that _may_ use an RPC to generate a 
serializable instance. These could be the minimal, serializable metadata to 
refer to that inode.

ping [~fabbri], [~eddyxu]

> API - expose an unique file identifier
> --------------------------------------
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.patch
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to