[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380666#comment-14380666
]
Daryn Sharp commented on HDFS-6658:
-----------------------------------
bq. Why did you roll your own bitset instead of using the Java bitset?
Per the comments in the class: "... The notable features this implementation
provides is modification tracking and capacity growth and shrinking ...". The
JDK BitSet doesn't provide public methods to control capacity. You must use
contortions based on knowledge of undocumented internal behavior.
You must first start off by allocating a bitset with no capacity so the size
isn't initially "sticky" (won't shrink). You can then double flip the desired
capacity bit to force growth. Shrink to fit requires a clone(). You don't
know in advance if the bitset can actually shrink, so cloning it based on a
_possibility_ is rather extreme.
When the bitset expands, it will do the _greater_ of the requested size or 2x
the current size. Doubling a large bitset just to make room for a small number
of additional bits is pretty extreme too. (Just noticed I should be a little
more aggressive in growth than current len+1, but not 2x aggressive).
The custom impl provides full control of capacity. The storage's capacity is
primed with the number of blocks in the BR. When the report is finished being
processed, the storage is told to shrink to fit if necessary. Shrinking won't
clone the entire object, and won't introduce future thread-safety issues of the
reference being swapped.
Lastly, it provides a modCount to aid in detecting concurrent modifications for
{{BlockNumbers}} which uses the bitset to densely pack its sparse array. Its
iterator can tolerate concurrent modifications unless they occur between
hasNext & next (see code comments).
I originally started off with the intent to be fully thread-safe, but soon
realized that subtle race conditions would surely be added. I chose to make
the minimal change required to the current design. That said, I have
considered how to make the structures thread-safe in the future.
Send me an email and we can setup a time to chat.
> Namenode memory optimization - Block replicas list
> ---------------------------------------------------
>
> Key: HDFS-6658
> URL: https://issues.apache.org/jira/browse/HDFS-6658
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: 2.4.1
> Reporter: Amir Langer
> Assignee: Daryn Sharp
> Attachments: BlockListOptimizationComparison.xlsx, BlocksMap
> redesign.pdf, HDFS-6658.patch, HDFS-6658.patch, HDFS-6658.patch, Namenode
> Memory Optimizations - Block replicas list.docx, New primative indexes.jpg,
> Old triplets.jpg
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a
> linked list of block references for every DatanodeStorageInfo (called
> "triplets").
> We propose to change the way we store the list in memory.
> Using primitive integer indexes instead of object references will reduce the
> memory needed for every block replica (when compressed oops is disabled) and
> in our new design the list overhead will be per DatanodeStorageInfo and not
> per block replica.
> see attached design doc. for details and evaluation results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)