[
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172208#comment-16172208
]
Chris Douglas commented on HDFS-7878:
-------------------------------------
Any other feedback on the patch?
As an API for open-exact, one possible implementation could use the {{Options}}
pattern used in FileContext and SequenceFile i.e.,
{code:java}
PathHandle FileStatus::getPathHandle(Options... opts);
{code}
which would imply {{open(PathHandle)}} instead of {{open(FileStatus)}}. A few
folks have raised the idea that ignoring some fields in the {{FileStatus}}
instance could be confusing (if the file were renamed, permissions/modfication
time changed, etc.). Using {{PathHandle}} explicitly would make that clearer,
and the Options API is one, generic way to surface different FileSystem
capabilities. However, it would mean that serializing a generic {{FileStatus}}
object may not be sufficient to implement the contract, unless its serialized
form supports all possible Options. This could be handled by a method on
FileSystem i.e.,
{code:java}
PathHandle FileSystem::getPathHandle(FileStatus status, Options... opts)
{code}
Which is cleaner in some respects, particularly if guaranteeing an option
requires an RPC to get/set state in the FileSystem. It also implies that the
{{PathHandle}} is sufficient to serialize across processes. It does nothing to
prevent crossing {{PathHandle}} instances across FileSystems, unless the
FileSystem serialized a guard on each instance.
This is all shuffling around a few APIs; the functionality is similar. Setting
a default of open-exact is probably what most users expect, and what most
FileSystems (S3, WASB) will implement. [~sershe], could you be more explicit
about the use in Hive? Do you need open-by-inodeID to resolve to any version of
the file?
It's worth mentioning that this spec is incomplete. Even if the open includes
guards, the stream is still subject to whatever the FIleSystem supports. So a
consistent open could still see stale/updated state.
/cc [~anu], [~andrew.wang], [[email protected]]
> API - expose an unique file identifier
> --------------------------------------
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Labels: BB2015-05-TBR
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch,
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch,
> HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch,
> HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch,
> HDFS-7878.12.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct
> when file is overwritten.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]