[
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661499#comment-14661499
]
Yi Liu edited comment on HDFS-8859 at 8/7/15 8:31 AM:
------------------------------------------------------
{{LightWeightHashGSet}} implemented in patch is a low memory footprint {{GSet}}
implementation, which uses an array for storing the elements and linked lists
for collision resolution. If the size of elements exceeds the threshold, the
internal array will be resized to double length. Default load factor is 0.75f
which is the same as java {{HashMap}}.
Currently {{LightWeightHashGSet}} doesn't shrink when removing elements and
hitting some threshold, I feel it's not necessary for our case. If you do think
we'd better to have this, I can do it in a follow-on.
As shown in the patch, {{ReplicaInfo}} needs to implement
{{LightWeightHashGSet.LinkedElement}} now, and modification in {{ReplicaMap}}
is to use this new lightweight set.
By using the new light weight set, we can get the benefits (reduce a lot of
DataNode (ReplicaMap) memory footprint) as described in the JIRA description.
Please review, thanks.
was (Author: hitliuyi):
{{LightWeightHashGSet}} implemented in patch is a low memory footprint {{GSet}}
implementation, which uses an array for storing the elements and linked lists
for collision resolution. If the size of elements exceeds the threshold, the
internal array will be resized to double length. Default load factor is 0.75f
which is the same as java {{HashMap}}.
Currently {{LightWeightHashGSet}} doesn't shrink when removing elements and
arriving some threshold, I feel it's not necessary for our case. If you do
think we'd better to have this, I can do it in a follow-on.
As shown in the patch, {{ReplicaInfo}} needs to implement
{{LightWeightHashGSet.LinkedElement}} now, and modification in {{ReplicaMap}}
is to use this new lightweight set.
By using the new light weight set, we can get the benefits (reduce a lot of
memory footprint) as described in the JIRA description.
> Improve DataNode (ReplicaMap) memory footprint to save about 45%
> ----------------------------------------------------------------
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Yi Liu
> Assignee: Yi Liu
> Priority: Critical
> Attachments: HDFS-8859.001.patch
>
>
> By using following approach we can save about *45%* memory footprint for each
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in
> DataNode), the details are:
> In ReplicaMap,
> {code}
> private final Map<String, Map<Long, ReplicaInfo>> map =
> new HashMap<String, Map<Long, ReplicaInfo>>();
> {code}
> Currently we use a HashMap {{Map<Long, ReplicaInfo>}} to store the replicas
> in memory. The key is block id of the block replica which is already
> included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry
> has a object overhead. We can implement a lightweight Set which is similar
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix
> size for the entries array, usually it's a big value, an example is
> {{BlocksMap}}, this can avoid full gc since no need to resize), also we
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set
> as described:
> We can save:
> {noformat}
> SIZE (bytes) ITEM
> 20 The Key: Long (12 bytes object overhead + 8
> bytes long)
> 12 HashMap Entry object overhead
> 4 reference to the key in Entry
> 4 reference to the value in Entry
> 4 hash in Entry
> {noformat}
> Total: -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes) ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total: +4 bytes
> So totally we can save 40bytes for each block replica
> And currently one finalized replica needs around 46 bytes (notice: we ignore
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica
> in DataNode.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)