[ 
https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765957#comment-15765957
 ] 

Andrew Wang commented on HDFS-6984:
-----------------------------------

I think "the serialization can omit fields" is an abstraction breakage. If 
we're going to have an open API that takes a FileStatus, the implementation 
should be allowed to use all the fields expected to be present for a FileStatus 
of that FileSystem. This means serialization that omits fields for efficiency 
isn't supportable, and is why I'd prefer a PathHandle. Regarding TOCTOU, this 
would be addressed by including the PathHandle in the returned FileStatus, 
right?

This discussion is an aside though to the matter at hand, and we should 
continue on HDFS-7878. I think we already agreed above that as long as we can 
add new fields to FileStatus, we can satisfy the basic requirements of 
HDFS-7878.

bq. I didn't mean to suggest that HdfsFileStatus should be a public API (with 
all the restrictions on evolving it)....<hole in the PB toolchain>

If we don't intend to make HdfsFileStatus public, what's the point of 
cross-serialization? We also need to qualify the path for an HdfsFileStatus to 
become a FileStatus, so I don't know how zero-copy it can be anyway.

I don't feel *that* strongly about removing Writable, but the nowritable patch 
is simple and to the point, and I still haven't grasped the benefit of keeping 
FileStatus Writable, even via PB. We don't think there are many (any?) apps out 
there using the Writable interface. Cross-serialization doesn't have an 
immediate usecase. HDFS-7878 IMO needs a serializable PathHandle, not a full 
FileStatus.

> In Hadoop 3, make FileStatus serialize itself via protobuf
> ----------------------------------------------------------
>
>                 Key: HDFS-6984
>                 URL: https://issues.apache.org/jira/browse/HDFS-6984
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Colin P. McCabe
>            Assignee: Colin P. McCabe
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch, 
> HDFS-6984.003.patch, HDFS-6984.nowritable.patch
>
>
> FileStatus was a Writable in Hadoop 2 and earlier.  Originally, we used this 
> to serialize it and send it over the wire.  But in Hadoop 2 and later, we 
> have the protobuf {{HdfsFileStatusProto}} which serves to serialize this 
> information.  The protobuf form is preferable, since it allows us to add new 
> fields in a backwards-compatible way.  Another issue is that already a lot of 
> subclasses of FileStatus don't override the Writable methods of the 
> superclass, breaking the interface contract that read(status.write) should be 
> equal to the original status.
> In Hadoop 3, we should just make FileStatus serialize itself via protobuf so 
> that we don't have to deal with these issues.  It's probably too late to do 
> this in Hadoop 2, since user code may be relying on the existing FileStatus 
> serialization there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to