[jira] [Commented] (HDFS-7784) load fsimage in parallel

Colin Patrick McCabe (JIRA) Mon, 21 Dec 2015 11:33:11 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066949#comment-15066949
 ]


Colin Patrick McCabe commented on HDFS-7784:
--------------------------------------------

Thanks, [~kihwal].  Unfortunately, that's what we've seen as well... protobuf 
seems to generate a lot of garbage during startup, causing many full GCs which 
really consume a lot of time.  It used to be you could ignore temporary objects 
as long as you didn't create tenured objects, but it turns out that if there 
are too many temporaries, HotSpot pushes them into the PermGen.  At this point, 
it's not clear that parallelization is a win for fsimage loading unless we can 
mitigate that GC problem.

Have you guys looked into using the "javanano" version of protocol buffers?  
See here: https://github.com/google/protobuf/tree/master/javanano

It seems like this would generate a lot less garbage than the "official" PB 
library because it avoids builders in favor of mutable state, uses ints instead 
of enums, uses arrays instead of ArrayList, etc. etc.  I think we should 
probably adopt this on the server-side, even if we keep the client-side with 
the existing PB library.  This would help with RPC as well, of course.

> load fsimage in parallel
> ------------------------
>
>                 Key: HDFS-7784
>                 URL: https://issues.apache.org/jira/browse/HDFS-7784
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Walter Su
>            Assignee: Walter Su
>            Priority: Minor
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7784) load fsimage in parallel

Reply via email to