[
https://issues.apache.org/jira/browse/HBASE-24859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185275#comment-17185275
]
Rushabh Shah commented on HBASE-24859:
--------------------------------------
> Can you add some data that gives some insight into memory usage? Like Xmx
>limits on the client JVM, no. of regions, key lengths, top 5-10 contributors
>(by %, based on a heap dump analysis etc)? I'm wondering if we can do some
>simple optimizations like dedup with interning, avoid unnecessary copies etc
>and get a reasonable improvement in the memory usage.
Xmx on client: 8GB
Number of regions in table: 20,321
I haven't looked at the key lengths but below is the screenshot of heap
histogram if that helps.
!hbase-24859.png!
I haven't studied the heap histogram in detail but looks like even if we do
some optimization, this problem will again bite us if number of regions grows
in future.
> A concern I have about this proposal is we can exclude regions that are
"empty" when calculating mapreduce splits, but then how do we know they are
still empty when the MR tasks are finally launched?
[~apurtell] I agree with this concern. If we introduce a config property
something like hbase.mapreduce.ignore.empty.regions (make it expert level
property) and default it to false so that we don't change any behavior, would
that alleviate your concern ?
> Remove the empty regions from the hbase mapreduce splits
> --------------------------------------------------------
>
> Key: HBASE-24859
> URL: https://issues.apache.org/jira/browse/HBASE-24859
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Reporter: Sandeep Pal
> Assignee: Sandeep Pal
> Priority: Major
> Attachments: hbase-24859.png
>
>
> It has been observed that when the table has too many regions, MR jobs
> consume more memory in the client. This is because we keep the region level
> information in memory and the memory heavy object is TableSplit because of
> Scan object as a part of it.
> We can optimize the memory consumption by not loading the region level
> information if the region is empty.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)