[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz Wo (Nicholas), SZE updated HDFS-1114: ----------------------------------------- Attachment: gset20100608.pdf > 1. Not using supplemental hash function will result in severe clustering > when we move to sequential block IDs (as only higher bits are used for hash). It is not a problem in our case because the hashCode() implematation in Block uses both higher and lower bits of the block ID. > 3. In "Arrays" section the GC behavior description was not clear. Not sure > how the GC behavior is better with arrays? The GC algorithm traverses the objects to determine which objects can be garbage collected. The GC behavior is better in arrays in the sense that there are fewer references in arrays. > 2, 4, 5 & 6 All this items are related to configuration and the hash table length. See the new design doc. gset20100608.pdf: rewrote Section 7 > Reducing NameNode memory usage by an alternate hash table > --------------------------------------------------------- > > Key: HDFS-1114 > URL: https://issues.apache.org/jira/browse/HDFS-1114 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: Tsz Wo (Nicholas), SZE > Assignee: Tsz Wo (Nicholas), SZE > Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch > > > NameNode uses a java.util.HashMap to store BlockInfo objects. When there are > many blocks in HDFS, this map uses a lot of memory in the NameNode. We may > optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.