[
https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884307#comment-13884307
]
Kihwal Lee commented on HDFS-5698:
----------------------------------
Thanks for running tests and sharing the numbers. I did some testing In the
past and the loading speed was about 30MB/sec at best. I/O wasn't the
bottleneck. THP and CompressedOOPS help a bit, but in the end the bottleneck
was java object creations. Due to the way things are serialized, multi-threaded
loading wasn't feasible.
Now that we have the inode section and the inode directory section separated,
parallelism can be added for loading each section. Please share your
implementation ideas. The parallelism may come out far less than expected due
to internal locks. So it will be great if a rough prototype & testing is done
to show what's attainable. Do you already have numbers for how long it took to
load each section?
> Use protobuf to serialize / deserialize FSImage
> -----------------------------------------------
>
> Key: HDFS-5698
> URL: https://issues.apache.org/jira/browse/HDFS-5698
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Haohui Mai
> Assignee: Haohui Mai
> Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch
>
>
> Currently, the code serializes FSImage using in-house serialization
> mechanisms. There are a couple disadvantages of the current approach:
> # Mixing the responsibility of reconstruction and serialization /
> deserialization. The current code paths of serialization / deserialization
> have spent a lot of effort on maintaining compatibility. What is worse is
> that they are mixed with the complex logic of reconstructing the namespace,
> making the code difficult to follow.
> # Poor documentation of the current FSImage format. The format of the FSImage
> is practically defined by the implementation. An bug in implementation means
> a bug in the specification. Furthermore, it also makes writing third-party
> tools quite difficult.
> # Changing schemas is non-trivial. Adding a field in FSImage requires bumping
> the layout version every time. Bumping out layout version requires (1) the
> users to explicitly upgrade the clusters, and (2) putting new code to
> maintain backward compatibility.
> This jira proposes to use protobuf to serialize the FSImage. Protobuf has
> been used to serialize / deserialize the RPC message in Hadoop.
> Protobuf addresses all the above problems. It clearly separates the
> responsibility of serialization and reconstructing the namespace. The
> protobuf files document the current format of the FSImage. The developers now
> can add optional fields with ease, since the old code can always read the new
> FSImage.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)