[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175788#comment-14175788
 ] 

Konstantin Shvachko commented on HDFS-6658:
-------------------------------------------

Pretty neat data structure, Amir. Coud be an improvement to the current 
structure, introduced way back in HADOOP-1687.
With BitSet you will need about 12K of contiquous space in RAM for every 
100,000 block report. Sounds reasonable.
The only concern is that removing large number of files, which is typically 
done when NN gets close to its capacity, does not free memory used by the 
removed replicas. It can be reused for new references, but not anything else. 
Unless some type of garbage collector is introduced. Would be interesting to 
see how it behaves on a cluster over time.

> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Amir Langer
>         Attachments: BlockListOptimizationComparison.xlsx, HDFS-6658.patch, 
> Namenode Memory Optimizations - Block replicas list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a 
> linked list of block references for every DatanodeStorageInfo (called 
> "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the 
> memory needed for every block replica (when compressed oops is disabled) and 
> in our new design the list overhead will be per DatanodeStorageInfo and not 
> per block replica.
> see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to