[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355932#comment-14355932
]
Daryn Sharp commented on HDFS-6658:
-----------------------------------
Sorry, didn't see your reply, but great minds think alike. I was thinking of
something similar to a mark and sweep of the blocks map based on a DN BR serial
number. I've long wanted for the DN to send it serial during registration so a
full BR can be avoided if the NN is already up to date with that serial. I
have other ideas for it, but I digress. No reverse mapping at all makes things
harder though.
Blocks disappearing w/o an IBR is more common than you think. At least one
storage fails per day in our env, which needs a way to quickly reverse map its
blocks for removal. Another scenario is all the blocks on a dead node need to
be removed. Any added latency in the NN issuing replication requests can be
dangerous. If an entire rack fails, then losing any node is virtually
guaranteed data loss.
I've shifted my thoughts and experiments to a long-to-long hashing of blockId
to its stored block index I've introduced - in the patch to be posted shortly.
A fast hash would allow inode and storages alike to only know a blockId. Move
the blockId/size/genstamp into my block replicas map as the head entry of the
chain before the subsequent replicas. Now the intermediate blocks map can
probably be eliminated for all but UC blocks.
> Namenode memory optimization - Block replicas list
> ---------------------------------------------------
>
> Key: HDFS-6658
> URL: https://issues.apache.org/jira/browse/HDFS-6658
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: 2.4.1
> Reporter: Amir Langer
> Assignee: Daryn Sharp
> Attachments: BlockListOptimizationComparison.xlsx, BlocksMap
> redesign.pdf, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas
> list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a
> linked list of block references for every DatanodeStorageInfo (called
> "triplets").
> We propose to change the way we store the list in memory.
> Using primitive integer indexes instead of object references will reduce the
> memory needed for every block replica (when compressed oops is disabled) and
> in our new design the list overhead will be per DatanodeStorageInfo and not
> per block replica.
> see attached design doc. for details and evaluation results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)