[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380666#comment-14380666
 ] 

Daryn Sharp commented on HDFS-6658:
-----------------------------------

bq.  Why did you roll your own bitset instead of using the Java bitset?

Per the comments in the class: "... The notable features this implementation 
provides is modification tracking and capacity growth and shrinking ...".  The 
JDK BitSet doesn't provide public methods to control capacity.  You must use 
contortions based on knowledge of undocumented internal behavior.

You must first start off by allocating a bitset with no capacity so the size 
isn't initially "sticky" (won't shrink).  You can then double flip the desired 
capacity bit to force growth.  Shrink to fit requires a clone().  You don't 
know in advance if the bitset can actually shrink, so cloning it based on a 
_possibility_ is rather extreme.

When the bitset expands, it will do the _greater_ of the requested size or 2x 
the current size.  Doubling a large bitset just to make room for a small number 
of additional bits is pretty extreme too. (Just noticed I should be a little 
more aggressive in growth than current len+1, but not 2x aggressive).

The custom impl provides full control of capacity.  The storage's capacity is 
primed with the number of blocks in the BR.  When the report is finished being 
processed, the storage is told to shrink to fit if necessary.  Shrinking won't 
clone the entire object, and won't introduce future thread-safety issues of the 
reference being swapped.

Lastly, it provides a modCount to aid in detecting concurrent modifications for 
{{BlockNumbers}} which uses the bitset to densely pack its sparse array.  Its 
iterator can tolerate concurrent modifications unless they occur between 
hasNext & next (see code comments).

I originally started off with the intent to be fully thread-safe, but soon 
realized that subtle race conditions would surely be added.  I chose to make 
the minimal change required to the current design.  That said, I have 
considered how to make the structures thread-safe in the future.

Send me an email and we can setup a time to chat.

> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Daryn Sharp
>         Attachments: BlockListOptimizationComparison.xlsx, BlocksMap 
> redesign.pdf, HDFS-6658.patch, HDFS-6658.patch, HDFS-6658.patch, Namenode 
> Memory Optimizations - Block replicas list.docx, New primative indexes.jpg, 
> Old triplets.jpg
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a 
> linked list of block references for every DatanodeStorageInfo (called 
> "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the 
> memory needed for every block replica (when compressed oops is disabled) and 
> in our new design the list overhead will be per DatanodeStorageInfo and not 
> per block replica.
> see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to