[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180682#comment-14180682
]
Konstantin Shvachko commented on HDFS-6658:
-------------------------------------------
I agree usually people remove data in order to have space to put more. And the
freed space usually fills up again in a couple of weeks or months.
I don't know if this asnwer is good enough. It is for me, but in the end you
got a bigger cluster.
It would be nice to find a way to detect fully empty arrays of the BlockList
and release them once the last reference is removed. That should be good enough
to avoid a stand-alone thread for garbage collecting or compacting in your
terms.
> Namenode memory optimization - Block replicas list
> ---------------------------------------------------
>
> Key: HDFS-6658
> URL: https://issues.apache.org/jira/browse/HDFS-6658
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: 2.4.1
> Reporter: Amir Langer
> Assignee: Amir Langer
> Attachments: BlockListOptimizationComparison.xlsx, HDFS-6658.patch,
> Namenode Memory Optimizations - Block replicas list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a
> linked list of block references for every DatanodeStorageInfo (called
> "triplets").
> We propose to change the way we store the list in memory.
> Using primitive integer indexes instead of object references will reduce the
> memory needed for every block replica (when compressed oops is disabled) and
> in our new design the list overhead will be per DatanodeStorageInfo and not
> per block replica.
> see attached design doc. for details and evaluation results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)