[ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512247#comment-15512247 ]
Chris Douglas commented on HDFS-7878: ------------------------------------- bq. is this somehow going to change end user APIs so a path isn't enough to refer to things? No, I'll try to summarize. This JIRA proposes an API for {{FileSystem}} that exposes HDFS open-by-inode. Not every implementation can enforce HDFS semantics precisely, but many can support an "open with verification" API that improves on the TOCTOU races common to most applications. These would mostly be new APIs. While v05 proposed a {{long}} as the fileId, implementations other than HDFS use different metadata (mostly strings; matching the {{FileStatus}} metadata is often possible). If the {{FileHandle}} were exposed as a type, then implementations could (opaquely) embed metadata in {{FileStatus}} for "consistent" operations. v06 added a {{InodeId}} type (renamed {{FileHandle}} or {{PathHandle}} or {{BikeShed}}). These metadata can be encoded as a new field in {{FileStatus}}, which would be API compatible, but serialized instances would not work across major versions. While this seems like a reasonable jump to make in 3.x, it could cause some pain. This aspect is discussed in HDFS-6984. In this JIRA, we're discussing user-facing APIs for consistently opening a file, a use case [~sershe] needs in Hive and HDFS-9806 needs for correctness. As [~cmccabe] pointed out, we also want to consider consistent handling of directories and symlinks as we define this API. In favor of augmenting {{FileStatus}}: it's simple, it's probably what most users expect, and it's serializable. It would be a natural, unsurprising API for create/rename/delete/listFileStatus. That said, it's also significantly larger than an 8 byte fileId, but implementations can detect crossed streams (i.e., requesting a fileId from the wrong {{FileSystem}}; {{ViewFS}} can use it for demux). In favor of {{open(FileHandle)}}, we could add {{FileSystem#createHandle(FileStatus)}} that _may_ use an RPC to generate a serializable instance. These could be the minimal, serializable metadata to refer to that inode. ping [~fabbri], [~eddyxu] > API - expose an unique file identifier > -------------------------------------- > > Key: HDFS-7878 > URL: https://issues.apache.org/jira/browse/HDFS-7878 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Labels: BB2015-05-TBR > Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, > HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, > HDFS-7878.06.patch, HDFS-7878.patch > > > See HDFS-487. > Even though that is resolved as duplicate, the ID is actually not exposed by > the JIRA it supposedly duplicates. > INode ID for the file should be easy to expose; alternatively ID could be > derived from block IDs, to account for appends... > This is useful e.g. for cache key by file, to make sure cache stays correct > when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org