[
https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765957#comment-15765957
]
Andrew Wang commented on HDFS-6984:
-----------------------------------
I think "the serialization can omit fields" is an abstraction breakage. If
we're going to have an open API that takes a FileStatus, the implementation
should be allowed to use all the fields expected to be present for a FileStatus
of that FileSystem. This means serialization that omits fields for efficiency
isn't supportable, and is why I'd prefer a PathHandle. Regarding TOCTOU, this
would be addressed by including the PathHandle in the returned FileStatus,
right?
This discussion is an aside though to the matter at hand, and we should
continue on HDFS-7878. I think we already agreed above that as long as we can
add new fields to FileStatus, we can satisfy the basic requirements of
HDFS-7878.
bq. I didn't mean to suggest that HdfsFileStatus should be a public API (with
all the restrictions on evolving it)....<hole in the PB toolchain>
If we don't intend to make HdfsFileStatus public, what's the point of
cross-serialization? We also need to qualify the path for an HdfsFileStatus to
become a FileStatus, so I don't know how zero-copy it can be anyway.
I don't feel *that* strongly about removing Writable, but the nowritable patch
is simple and to the point, and I still haven't grasped the benefit of keeping
FileStatus Writable, even via PB. We don't think there are many (any?) apps out
there using the Writable interface. Cross-serialization doesn't have an
immediate usecase. HDFS-7878 IMO needs a serializable PathHandle, not a full
FileStatus.
> In Hadoop 3, make FileStatus serialize itself via protobuf
> ----------------------------------------------------------
>
> Key: HDFS-6984
> URL: https://issues.apache.org/jira/browse/HDFS-6984
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 3.0.0-alpha1
> Reporter: Colin P. McCabe
> Assignee: Colin P. McCabe
> Labels: BB2015-05-TBR
> Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch,
> HDFS-6984.003.patch, HDFS-6984.nowritable.patch
>
>
> FileStatus was a Writable in Hadoop 2 and earlier. Originally, we used this
> to serialize it and send it over the wire. But in Hadoop 2 and later, we
> have the protobuf {{HdfsFileStatusProto}} which serves to serialize this
> information. The protobuf form is preferable, since it allows us to add new
> fields in a backwards-compatible way. Another issue is that already a lot of
> subclasses of FileStatus don't override the Writable methods of the
> superclass, breaking the interface contract that read(status.write) should be
> equal to the original status.
> In Hadoop 3, we should just make FileStatus serialize itself via protobuf so
> that we don't have to deal with these issues. It's probably too late to do
> this in Hadoop 2, since user code may be relying on the existing FileStatus
> serialization there.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]