Haohui Mai created HDFS-5698:
--------------------------------
Summary: Use protobuf to serialize / deserialize FSImage
Key: HDFS-5698
URL: https://issues.apache.org/jira/browse/HDFS-5698
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
Currently, the code serializes FSImage using in-house serialization mechanisms.
There are a couple disadvantages of the current approach:
# Mixing the responsibility of reconstruction and serialization /
deserialization. The current code paths of serialization / deserialization have
spent a lot of effort on maintaining compatibility. What is worse is that they
are mixed with the complex logic of reconstructing the namespace, making the
code difficult to follow.
# Poor documentation of the current FSImage format. The format of the FSImage
is practically defined by the implementation. An bug in implementation means a
bug in the specification. Furthermore, it also makes writing third-party tools
quite difficult.
# Changing schemas is non-trivial. Adding a field in FSImage requires bumping
the layout version every time. Bumping out layout version requires (1) the
users to explicitly upgrade the clusters, and (2) putting new code to maintain
backward compatibility.
This jira proposes to use protobuf to serialize the FSImage. Protobuf has been
used to serialize / deserialize the RPC message in Hadoop.
Protobuf addresses all the above problems. It clearly separates the
responsibility of serialization and reconstructing the namespace. The protobuf
files document the current format of the FSImage. The developers now can add
optional fields with ease, since the old code can always read the new FSImage.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)