[
https://issues.apache.org/jira/browse/HDFS-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Foley updated HDFS-1366:
-----------------------------
Attachment: FSImageRead_shortcut_proto.patch
The code has changed since this ticket was opened. In March I did some
experiments, and at that time there was no longer a BlocksMap.checkBlockInfo()
method, and the call sequence was:
{code}
FSImage.loadFSImage()
FSImageFormat.Loader.load()
FSImageFormat.Loader.loadFullNameINodes()
FSDirectory.addToParent()
BlockManager.addINode()
BlocksMap.addINode()
{code}
BlocksMap.addINode() did this:
{code}
BlockInfo addINode(BlockInfo b, INodeFile iNode) {
BlockInfo info = blocks.get(b);
if (info != b) {
info = b;
blocks.put(info);
}
info.setINode(iNode);
return info;
}
{code}
which could be replaced by
{code}
BlockInfo addINode(BlockInfo b, INodeFile iNode) {
blocks.put(b);
b.setINode(iNode);
return b;
}
{code}
Calling blocks.get() before conditionally calling blocks.put() in this way is a
waste regardless of whether we are reading the FSImage or calling addINode()
for any other purpose, because the cost of put and get are about the same, and
the result of just calling put is identical to the above code. However, I put
this into a simple proof-of-principle patch (attached - not ready for prime
time) and tried it. I only got a 6% improvement in FSImage load time.
> reduce namenode startup time by optimising checkBlockInfo while loading
> fsimage
> ---------------------------------------------------------------------------------
>
> Key: HDFS-1366
> URL: https://issues.apache.org/jira/browse/HDFS-1366
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Reporter: dhruba borthakur
> Attachments: FSImageRead_shortcut_proto.patch
>
>
> The namenode spends about 10 minutes reading in a 14 GB fsimage file into
> memory and creating all the in-memory data structures. A jstack based
> debugger clearly shows that most of the time during the fsimage load is spent
> in BlocksMap.checkBlockInfo. There is a easy way to optimize this method
> especially for this code path.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira