[
https://issues.apache.org/jira/browse/HBASE-24859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185280#comment-17185280
]
Sandeep Pal edited comment on HBASE-24859 at 8/26/20, 5:49 PM:
---------------------------------------------------------------
[~bharathv] [~shahrs87] [~apurtell]
The heap is predominantly occupied by TableSplit and especially the
[scan|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java#L87]
within the TableSplit.
!Screen Shot 2020-08-26 at 8.44.34 AM.png!
was (Author: sandeep.pal):
[~bharathv] [~shahrs87] [~apurtell]
The heap is predominantly occupied by TableSplit and especially the
[scan|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java#L87]
within the TableSplit.
> Remove the empty regions from the hbase mapreduce splits
> --------------------------------------------------------
>
> Key: HBASE-24859
> URL: https://issues.apache.org/jira/browse/HBASE-24859
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Reporter: Sandeep Pal
> Assignee: Sandeep Pal
> Priority: Major
> Attachments: Screen Shot 2020-08-26 at 8.44.34 AM.png,
> hbase-24859.png, screenshot-1.png
>
>
> It has been observed that when the table has too many regions, MR jobs
> consume more memory in the client. This is because we keep the region level
> information in memory and the memory heavy object is TableSplit because of
> the Scan object as a part of it.
> We can optimize the memory consumption by not loading the region level
> information if the region is empty based on the configuration.
> The default configuration can lead to all TableSplits in memory (no change
> from the current), but the configuration can enable the map-reduce job to
> ignore the empty regions. The configuration can be a part of MR job based.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)