[
https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859074#comment-13859074
]
Todd Lipcon commented on HDFS-5698:
-----------------------------------
With the proposal we should make sure we still have a way of preventing loss of
data using an older version to read a newer version file. That is to say, it's
important to still have a version number, or the "feature flags" (HDFS-5223)
implemented. Otherwise, you may be relying on some new feature present in an
optional field, and restarting with an old NN would silently ignore and delete
that data on next checkpoint.
> Use protobuf to serialize / deserialize FSImage
> -----------------------------------------------
>
> Key: HDFS-5698
> URL: https://issues.apache.org/jira/browse/HDFS-5698
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Haohui Mai
> Assignee: Haohui Mai
>
> Currently, the code serializes FSImage using in-house serialization
> mechanisms. There are a couple disadvantages of the current approach:
> # Mixing the responsibility of reconstruction and serialization /
> deserialization. The current code paths of serialization / deserialization
> have spent a lot of effort on maintaining compatibility. What is worse is
> that they are mixed with the complex logic of reconstructing the namespace,
> making the code difficult to follow.
> # Poor documentation of the current FSImage format. The format of the FSImage
> is practically defined by the implementation. An bug in implementation means
> a bug in the specification. Furthermore, it also makes writing third-party
> tools quite difficult.
> # Changing schemas is non-trivial. Adding a field in FSImage requires bumping
> the layout version every time. Bumping out layout version requires (1) the
> users to explicitly upgrade the clusters, and (2) putting new code to
> maintain backward compatibility.
> This jira proposes to use protobuf to serialize the FSImage. Protobuf has
> been used to serialize / deserialize the RPC message in Hadoop.
> Protobuf addresses all the above problems. It clearly separates the
> responsibility of serialization and reconstructing the namespace. The
> protobuf files document the current format of the FSImage. The developers now
> can add optional fields with ease, since the old code can always read the new
> FSImage.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)