[
https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261724#comment-16261724
]
Chris Douglas commented on HDFS-12681:
--------------------------------------
This is ready for review. To page in some context, the purpose of this JIRA is
to allow HDFS to return a user-visible type that does not discard its metadata
payload. This is a necessary followup to HDFS-7878, which allows a
{{FileSystem}} to create a serializable {{PathHandle}} to an entity. HDFS-7878
built on HDFS-6984, which changed {{FileStatus}} serialization to use protocol
buffers instead of {{Writable}} APIs, anticipating a {{PathHandle}} payload
returned from the NameNode.
HDFS-6984 effected the following change:
{noformat}
Before: After:
FileStatus
FileStatus <(COPY) HdfsFileStatus | \
| | |
HdfsFileStatus
LocatedFileStatus <(COPY) HdfsLocatedFileStatus |
\
LocatedFileStatus <(COPY)
HdfsLocatedFileStatus
{noformat}
Before HDFS-6984, the client would obtain an {{HdfsFileStatus}}, then copy a
subset of its fields to a vanilla {{FileStatus}} instance. A comparable
operation was used for {{LocatedFileStatus}}.
Since HDFS-7878 wanted to use the open-by-inode API and the inode was stored in
a {{fileId}} field on {{HdfsFileStatus}}, the copy lost information necessary
to create the handle. HDFS-6984 set up HDFS-7878 to receive and propagate any
{{FileSystem}} metadata to support an {{open(FileStatus)}} call.
Instead, consensus in HDFS-7878 settled on an {{open(PathHandle)}} API. As a
consequence, {{FileStatus}} and its subclass {{LocatedFileStatus}} did not have
a generic way to store a {{PathHandle}}. This was fine for {{HdfsFileStatus}},
since it is a subtype of {{FileStatus}}. However, to create a
{{LocatedFileStatus}}, HDFS-6984 would copy (and lose) this metadata.
The first solution committed (v10) attempted to resolve this by collapsing
{{HdfsFileStatus}} and {{LocatedFileStatus}} into a single class:
{noformat}
FileStatus
|
LocatedFileStatus
|
HdfsFileStatus(+HdfsLocatedFileStatus)
{noformat}
If clients did not rely on the type to distinguish whether they need to call
{{FileSystem::getFileBlockLocations}}, and instead made the (more efficient)
single-RPC call that returned {{LocatedFileStatus}}, this is fine.
Unfortunately, as demonstrated, clients use the 2-RPC variant too often for
this approach to be viable.
Consequently, v13 proposes the following refactoring:
{noformat}
FileStatus HdfsFileStatus
| \ / /
| HdfsNamedFileStatus /
LocatedFileStatus /
\ /
HdfsLocatedFileStatus
{noformat}
{{HdfsFileStatus}} is an interface and its functionality is split across two
concrete types.
Clients can distinguish between the two instances by type, unlike in v10. This
also supports the {{FileSystem}} carrying metadata in its own type, so it is
sufficient to implement HDFS-7878. The fact that {{HdfsLocatedFileStatus}} is
not a subtype of {{HdfsNamedFileStatus}} should be irrelevant to user code,
since neither is user-facing and a client can still pass a mix of
{{FileStatus}} objects as required. This should be backwards-compatible.
Pulling methods into an {{HdfsFileStatus}} interface minimizes changes to HDFS.
It is necessary because {{FileStatus}} is a user-facing, concrete class.
Extracting an interface from {{FileStatus}} would have limited applicability,
so this adds a unit test that verifies that {{HdfsFileStatus}} remains a
superset. Alternatively, one could add a new interface implemented by
{{FileStatus}} and extended by HdfsFileStatus. This is slightly "cleaner", but
the unit test is still necessary because user code only uses {{FileStatus}}.
> Make HdfsLocatedFileStatus a subtype of LocatedFileStatus
> ---------------------------------------------------------
>
> Key: HDFS-12681
> URL: https://issues.apache.org/jira/browse/HDFS-12681
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Chris Douglas
> Assignee: Chris Douglas
> Priority: Minor
> Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch,
> HDFS-12681.02.patch, HDFS-12681.03.patch, HDFS-12681.04.patch,
> HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch,
> HDFS-12681.08.patch, HDFS-12681.09.patch, HDFS-12681.10.patch,
> HDFS-12681.11.patch, HDFS-12681.12.patch, HDFS-12681.13.patch
>
>
> {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of
> {{LocatedFileStatus}}. Conversion requires copying common fields and shedding
> unknown data. It would be cleaner and sufficient for {{HdfsFileStatus}} to
> extend {{LocatedFileStatus}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]