[ 
https://issues.apache.org/jira/browse/HBASE-24859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185280#comment-17185280
 ] 

Sandeep Pal edited comment on HBASE-24859 at 8/26/20, 5:49 PM:
---------------------------------------------------------------

[~bharathv] [~shahrs87] [~apurtell]
The heap is predominantly occupied by TableSplit and especially the 
[scan|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java#L87]
 within the TableSplit.

 !Screen Shot 2020-08-26 at 8.44.34 AM.png! 


was (Author: sandeep.pal):
[~bharathv] [~shahrs87] [~apurtell]
The heap is predominantly occupied by TableSplit and especially the 
[scan|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java#L87]
 within the TableSplit.



> Remove the empty regions from the hbase mapreduce splits
> --------------------------------------------------------
>
>                 Key: HBASE-24859
>                 URL: https://issues.apache.org/jira/browse/HBASE-24859
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Sandeep Pal
>            Assignee: Sandeep Pal
>            Priority: Major
>         Attachments: Screen Shot 2020-08-26 at 8.44.34 AM.png, 
> hbase-24859.png, screenshot-1.png
>
>
> It has been observed that when the table has too many regions, MR jobs 
> consume more memory in the client. This is because we keep the region level 
> information in memory and the memory heavy object is TableSplit because of 
> the Scan object as a part of it.
> We can optimize the memory consumption by not loading the region level 
> information if the region is empty based on the configuration.
> The default configuration can lead to all TableSplits in memory (no change 
> from the current), but the configuration can enable the map-reduce job to 
> ignore the empty regions. The configuration can be a part of MR job based. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to