[ https://issues.apache.org/jira/browse/HDFS-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015701#comment-13015701 ]
Tsz Wo (Nicholas), SZE commented on HDFS-1070: ---------------------------------------------- Here is an orthogonal idea for reducing image size: In NameNode, we have internal maps for usernames, groups to serial numbers (see {{SerialNumberManager}}) in order to save memory in the NameNode. How about we do the same for {{FSImage}}? I.e. write the maps in the beginning of {{FSImage}} and then use the serial numbers in the {{INode}} entries. Suppose the saving is 10 bytes per name, that is 20 bytes per {{INode}}. Then, it is about 1.1 GB for a namespace with 60 million files/directories. > Speedup NameNode image loading and saving by storing local file names > --------------------------------------------------------------------- > > Key: HDFS-1070 > URL: https://issues.apache.org/jira/browse/HDFS-1070 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Attachments: trunkLocalNameImage.patch, trunkLocalNameImage1.patch, > trunkLocalNameImage3.patch, trunkLocalNameImage4.patch, > trunkLocalNameImage5.patch > > > Currently each inode stores its full path in the fsimage. I'd propose to > store the local name instead. In order for each inode to identify its parent, > all inodes in a directory tree are stored in the image in in-order. This > proposal also requires each directory stores the number of its children in > image. > This proposal would bring a few benefits as pointed below and therefore > speedup the image loading and saving. > # Remove the overhead of converting java-UTF8 encoded local name to > string-represented full path then to UTF8 encoded full path when saving to an > image and vice versa when loading the image. > # Remove the overhead of traversing the full path when inserting the inode to > its parent inode. > # Reduce the number of temporary java objects during the process of image > loading or saving and therefore reduce the GC overhead. > # Reduce the size of an image. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira