[ https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261724#comment-16261724 ]
Chris Douglas commented on HDFS-12681: -------------------------------------- This is ready for review. To page in some context, the purpose of this JIRA is to allow HDFS to return a user-visible type that does not discard its metadata payload. This is a necessary followup to HDFS-7878, which allows a {{FileSystem}} to create a serializable {{PathHandle}} to an entity. HDFS-7878 built on HDFS-6984, which changed {{FileStatus}} serialization to use protocol buffers instead of {{Writable}} APIs, anticipating a {{PathHandle}} payload returned from the NameNode. HDFS-6984 effected the following change: {noformat} Before: After: FileStatus FileStatus <(COPY) HdfsFileStatus | \ | | | HdfsFileStatus LocatedFileStatus <(COPY) HdfsLocatedFileStatus | \ LocatedFileStatus <(COPY) HdfsLocatedFileStatus {noformat} Before HDFS-6984, the client would obtain an {{HdfsFileStatus}}, then copy a subset of its fields to a vanilla {{FileStatus}} instance. A comparable operation was used for {{LocatedFileStatus}}. Since HDFS-7878 wanted to use the open-by-inode API and the inode was stored in a {{fileId}} field on {{HdfsFileStatus}}, the copy lost information necessary to create the handle. HDFS-6984 set up HDFS-7878 to receive and propagate any {{FileSystem}} metadata to support an {{open(FileStatus)}} call. Instead, consensus in HDFS-7878 settled on an {{open(PathHandle)}} API. As a consequence, {{FileStatus}} and its subclass {{LocatedFileStatus}} did not have a generic way to store a {{PathHandle}}. This was fine for {{HdfsFileStatus}}, since it is a subtype of {{FileStatus}}. However, to create a {{LocatedFileStatus}}, HDFS-6984 would copy (and lose) this metadata. The first solution committed (v10) attempted to resolve this by collapsing {{HdfsFileStatus}} and {{LocatedFileStatus}} into a single class: {noformat} FileStatus | LocatedFileStatus | HdfsFileStatus(+HdfsLocatedFileStatus) {noformat} If clients did not rely on the type to distinguish whether they need to call {{FileSystem::getFileBlockLocations}}, and instead made the (more efficient) single-RPC call that returned {{LocatedFileStatus}}, this is fine. Unfortunately, as demonstrated, clients use the 2-RPC variant too often for this approach to be viable. Consequently, v13 proposes the following refactoring: {noformat} FileStatus HdfsFileStatus | \ / / | HdfsNamedFileStatus / LocatedFileStatus / \ / HdfsLocatedFileStatus {noformat} {{HdfsFileStatus}} is an interface and its functionality is split across two concrete types. Clients can distinguish between the two instances by type, unlike in v10. This also supports the {{FileSystem}} carrying metadata in its own type, so it is sufficient to implement HDFS-7878. The fact that {{HdfsLocatedFileStatus}} is not a subtype of {{HdfsNamedFileStatus}} should be irrelevant to user code, since neither is user-facing and a client can still pass a mix of {{FileStatus}} objects as required. This should be backwards-compatible. Pulling methods into an {{HdfsFileStatus}} interface minimizes changes to HDFS. It is necessary because {{FileStatus}} is a user-facing, concrete class. Extracting an interface from {{FileStatus}} would have limited applicability, so this adds a unit test that verifies that {{HdfsFileStatus}} remains a superset. Alternatively, one could add a new interface implemented by {{FileStatus}} and extended by HdfsFileStatus. This is slightly "cleaner", but the unit test is still necessary because user code only uses {{FileStatus}}. > Make HdfsLocatedFileStatus a subtype of LocatedFileStatus > --------------------------------------------------------- > > Key: HDFS-12681 > URL: https://issues.apache.org/jira/browse/HDFS-12681 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Chris Douglas > Assignee: Chris Douglas > Priority: Minor > Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch, > HDFS-12681.02.patch, HDFS-12681.03.patch, HDFS-12681.04.patch, > HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch, > HDFS-12681.08.patch, HDFS-12681.09.patch, HDFS-12681.10.patch, > HDFS-12681.11.patch, HDFS-12681.12.patch, HDFS-12681.13.patch > > > {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of > {{LocatedFileStatus}}. Conversion requires copying common fields and shedding > unknown data. It would be cleaner and sufficient for {{HdfsFileStatus}} to > extend {{LocatedFileStatus}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org