[ https://issues.apache.org/jira/browse/HDFS-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017143#comment-13017143 ]
Matt Foley commented on HDFS-1070: ---------------------------------- At Hairong's request, I did some perf testing on the patch. I ran the original trunk and Hairong's patches against a synthetic namespace image with 16M files, two blocks per file, 100 files per directory. It needs about 14GB of JVM to load on the NN. The image used rather short numeric directory names, which decreased the benefit of the patch compared to Hairong's real-world measurements, but the benefits were still quite nice. The FSImage was uncompressed. There was no observed performance difference between trunkLocalNameImage5.patch and trunkLocalNameImage6.patch, or trunkLocalNameImage6.patch with the above suggested modification. || ||Full Path Image||Local Name Image|| % Improvement || | Image Size | 3.07GB | 2.27GB | -26% | | Load Time | 72sec | 60sec | -17% | | Save Time | 50sec | 38sec | -24% | A very nice improvement, and well worthwhile. Based on Hairong's measurements, we expect even larger benefits with real-world FSImages, which have longer path names and more files per directory. If it passes full testing, I recommend for commit. > Speedup NameNode image loading and saving by storing local file names > --------------------------------------------------------------------- > > Key: HDFS-1070 > URL: https://issues.apache.org/jira/browse/HDFS-1070 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Attachments: trunkLocalNameImage.patch, trunkLocalNameImage1.patch, > trunkLocalNameImage3.patch, trunkLocalNameImage4.patch, > trunkLocalNameImage5.patch, trunkLocalNameImage6.patch > > > Currently each inode stores its full path in the fsimage. I'd propose to > store the local name instead. In order for each inode to identify its parent, > all inodes in a directory tree are stored in the image in in-order. This > proposal also requires each directory stores the number of its children in > image. > This proposal would bring a few benefits as pointed below and therefore > speedup the image loading and saving. > # Remove the overhead of converting java-UTF8 encoded local name to > string-represented full path then to UTF8 encoded full path when saving to an > image and vice versa when loading the image. > # Remove the overhead of traversing the full path when inserting the inode to > its parent inode. > # Reduce the number of temporary java objects during the process of image > loading or saving and therefore reduce the GC overhead. > # Reduce the size of an image. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira