[ 
https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261724#comment-16261724
 ] 

Chris Douglas commented on HDFS-12681:
--------------------------------------

This is ready for review. To page in some context, the purpose of this JIRA is 
to allow HDFS to return a user-visible type that does not discard its metadata 
payload. This is a necessary followup to HDFS-7878, which allows a 
{{FileSystem}} to create a serializable {{PathHandle}} to an entity. HDFS-7878 
built on HDFS-6984, which changed {{FileStatus}} serialization to use protocol 
buffers instead of {{Writable}} APIs, anticipating a {{PathHandle}} payload 
returned from the NameNode.

HDFS-6984 effected the following change:
{noformat}
Before:                                            After:
                                                         FileStatus
      FileStatus    <(COPY)     HdfsFileStatus               |     \
          |                          |                       |      
HdfsFileStatus
  LocatedFileStatus <(COPY) HdfsLocatedFileStatus            |                  
  \
                                                     LocatedFileStatus <(COPY) 
HdfsLocatedFileStatus
{noformat}
Before HDFS-6984, the client would obtain an {{HdfsFileStatus}}, then copy a 
subset of its fields to a vanilla {{FileStatus}} instance. A comparable 
operation was used for {{LocatedFileStatus}}.

Since HDFS-7878 wanted to use the open-by-inode API and the inode was stored in 
a {{fileId}} field on {{HdfsFileStatus}}, the copy lost information necessary 
to create the handle. HDFS-6984 set up HDFS-7878 to receive and propagate any 
{{FileSystem}} metadata to support an {{open(FileStatus)}} call.

Instead, consensus in HDFS-7878 settled on an {{open(PathHandle)}} API. As a 
consequence, {{FileStatus}} and its subclass {{LocatedFileStatus}} did not have 
a generic way to store a {{PathHandle}}. This was fine for {{HdfsFileStatus}}, 
since it is a subtype of {{FileStatus}}. However, to create a 
{{LocatedFileStatus}}, HDFS-6984 would copy (and lose) this metadata.

The first solution committed (v10) attempted to resolve this by collapsing 
{{HdfsFileStatus}} and {{LocatedFileStatus}} into a single class:
{noformat}
      FileStatus
          |
  LocatedFileStatus
          |
   HdfsFileStatus(+HdfsLocatedFileStatus)
{noformat}
If clients did not rely on the type to distinguish whether they need to call 
{{FileSystem::getFileBlockLocations}}, and instead made the (more efficient) 
single-RPC call that returned {{LocatedFileStatus}}, this is fine. 
Unfortunately, as demonstrated, clients use the 2-RPC variant too often for 
this approach to be viable.

Consequently, v13 proposes the following refactoring:
{noformat}
      FileStatus                 HdfsFileStatus
          |     \                   /   /
          |      HdfsNamedFileStatus   /
  LocatedFileStatus                   /
               \                     /
                HdfsLocatedFileStatus
{noformat}
{{HdfsFileStatus}} is an interface and its functionality is split across two 
concrete types.

Clients can distinguish between the two instances by type, unlike in v10. This 
also supports the {{FileSystem}} carrying metadata in its own type, so it is 
sufficient to implement HDFS-7878. The fact that {{HdfsLocatedFileStatus}} is 
not a subtype of {{HdfsNamedFileStatus}} should be irrelevant to user code, 
since neither is user-facing and a client can still pass a mix of 
{{FileStatus}} objects as required. This should be backwards-compatible.

Pulling methods into an {{HdfsFileStatus}} interface minimizes changes to HDFS. 
It is necessary because {{FileStatus}} is a user-facing, concrete class. 
Extracting an interface from {{FileStatus}} would have limited applicability, 
so this adds a unit test that verifies that {{HdfsFileStatus}} remains a 
superset. Alternatively, one could add a new interface implemented by 
{{FileStatus}} and extended by HdfsFileStatus. This is slightly "cleaner", but 
the unit test is still necessary because user code only uses {{FileStatus}}.



> Make HdfsLocatedFileStatus a subtype of LocatedFileStatus
> ---------------------------------------------------------
>
>                 Key: HDFS-12681
>                 URL: https://issues.apache.org/jira/browse/HDFS-12681
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>         Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch, 
> HDFS-12681.02.patch, HDFS-12681.03.patch, HDFS-12681.04.patch, 
> HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch, 
> HDFS-12681.08.patch, HDFS-12681.09.patch, HDFS-12681.10.patch, 
> HDFS-12681.11.patch, HDFS-12681.12.patch, HDFS-12681.13.patch
>
>
> {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of 
> {{LocatedFileStatus}}. Conversion requires copying common fields and shedding 
> unknown data. It would be cleaner and sufficient for {{HdfsFileStatus}} to 
> extend {{LocatedFileStatus}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to