[
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319097#comment-14319097
]
Colin Patrick McCabe commented on HDFS-7784:
--------------------------------------------
Hi Walter, this is an interesting idea.
We have found that GC is a major part of NN startup time. Have you tested with
FSImages larger than 3 GB?
If we are doing a lot of buffering, my concern would be that GC could get worse.
One thing we might consider is a two-thread system, where one thread does
deserialization and puts the results into a BlockingQueue read by the other FSN
loading thread. This would avoid buffering an enormous amount of data, but
still get 2x parallelism.
> load fsimage in parallel
> ------------------------
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Walter Su
> Assignee: Walter Su
>
> When single Namenode has huge amount of files, without using federation, the
> startup/restart speed is slow. The fsimage loading step takes the most of the
> time. fsimage loading can seperate to two parts, deserialization and object
> construction(mostly map insertion). Deserialization takes the most of CPU
> time. So we can do deserialization in parallel, and add to hashmap in serial.
> It will significantly reduce the NN start time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)