[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875211#action_12875211 ]
Suresh Srinivas commented on HDFS-1114: --------------------------------------- # Not using supplemental hash function will result in severe clustering when we move to sequential block IDs (as only higher bits are used for hash). # Why do we need configurability of either using java HashMap or this new implementation? #* With new impl, BlockInfo implements LinkedElement interface. On switching to java HashMap would it continue to implement this interface and incur the cost of {{next}} member in BlockInfo? # In "Arrays" section the GC behavior description was not clear. Not sure how the GC behavior is better with arrays? # Static array size for the map simplifies the code, but pushes complexity to the cluster admin by adding one more configuration. This configuration is an internal implementation detail which a cluster admin may not understand and get it right. If it configured wrong and the cluster continues to work, cluster admin may not be aware of performance degradation. # I feel we should implement resizing to avoid introducing config param. It is a rare event on a stable cluster. NN has enough heap head room to account for floating garage and YG guarantee. Hence availability of memory should not be an issue. Worst case scenario, resize may trigger a full GC. # If we implement resizing we should also think about 2^N table size as it has potential to waste a lot of memory during doubling, especially considering millions of entries in the table. > Reducing NameNode memory usage by an alternate hash table > --------------------------------------------------------- > > Key: HDFS-1114 > URL: https://issues.apache.org/jira/browse/HDFS-1114 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: Tsz Wo (Nicholas), SZE > Assignee: Tsz Wo (Nicholas), SZE > Attachments: GSet20100525.pdf > > > NameNode uses a java.util.HashMap to store BlockInfo objects. When there are > many blocks in HDFS, this map uses a lot of memory in the NameNode. We may > optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.