[
https://issues.apache.org/jira/browse/HBASE-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732235#comment-13732235
]
ramkrishna.s.vasudevan commented on HBASE-7391:
-----------------------------------------------
{code}
if (recoveredEdits) {
// This will never change
regionDict.init(1);
tableDict.init(1);
rowDict.init(Short.MAX_VALUE);
} else {
regionDict.init(Short.MAX_VALUE);
tableDict.init(Short.MAX_VALUE);
rowDict.init(Short.MAX_VALUE);
}
familyDict.init(Byte.MAX_VALUE);
qualifierDict.init(Byte.MAX_VALUE);
{code}
If we make a change like above for the recovered.edits case as described in the
above description we would be reducing the memory 4 times i.e from 5 MB to 1MB
for every writer instantiated.
For the regionname and table name we know that for sure not more than 1 entry
would be there.
For the family and qualifier dictionary (both in the normal case and recovered
edits) we could make it Max value of byte. Anyway it is LRU type so if any use
case has more than 127 qualifiers per CF that would go in roundrobin fashion.
Let the rowDict be as Short.MAX_VALUE.
So when the size reduces from 5MB to 1MB overall the size for 1500 regions
would come down to 1.5 GB for 1500 regions (from ~7GB). What do you think of
this change?
I can submit a patch based on this.
> Review/improve HLog compression's memory consumption
> ----------------------------------------------------
>
> Key: HBASE-7391
> URL: https://issues.apache.org/jira/browse/HBASE-7391
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.95.2
>
>
> From Ram in
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201205.mbox/%3C00bc01cd31e6$7caf1320$760d3960$%[email protected]%3E:
> {quote}
> One small observation after giving +1 on the RC.
> The WAL compression feature causes OOME and causes Full GC.
> The problem is, if we have 1500 regions and I need to create recovered.edits
> for each of the region (I don’t have much data in the regions (~300MB)).
> Now when I try to build the dictionary there is a Node object getting
> created.
> Each node object occupies 32 bytes.
> We have 5 such dictionaries.
> Initially we create indexToNodes array and its size is 32767.
> So now we have 32*5*32767 = ~5MB.
> Now I have 1500 regions.
> So 5MB*1500 = ~7GB.(Excluding actual data). This seems to a very high
> initial memory foot print and this never allows me to split the logs and I
> am not able to make the cluster up at all.
> Our configured heap size was 8GB, tested in 3 node cluster with 5000
> regions, very less data( 1GB in hdfs cluster including replication), some
> small data is spread evenly across all regions.
> The formula is 32(Node object size)*5(No of dictionary)*32767(no of node
> objects)*noofregions.
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira