[jira] [Comment Edited] (HBASE-24859) Remove the empty regions from the hbase mapreduce splits

Andrew Kyle Purtell (Jira) Wed, 26 Aug 2020 10:15:29 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185355#comment-17185355
 ]


Andrew Kyle Purtell edited comment on HBASE-24859 at 8/26/20, 5:14 PM:
-----------------------------------------------------------------------

bq. . If we introduce a config property something like 
hbase.mapreduce.ignore.empty.regions (make it expert level property) and 
default it to false so that we don't change any behavior, would that alleviate 
your concern ?

Not really.

If you have so many empty regions that this actually works, why aren't you 
using the region normalizer as is the recommended practice?

If the problem is a map or reduce task OOMEs because the representation of 
region locations is too large, then if you ignore "empty" regions but the table 
grows some more with non-empty regions, you haven't solved the root cause and 
will OOME again with no recourse. Why not raise memory for tasks? Why not try 
to optimize the memory usage of region location for splits? I.e. address the 
root cause, rather than put in a kludge. 

If the problem is you have a fixed amount of heap available for your tasks but 
too many regions, again the solution is the region normalizer, because you will 
need to hold down the total number of regions (and/or optimize heap 
representation of scan metadata) to live within the task memory limits. 

Finally, what is "empty" at split calculation time may not be empty at scan 
time, there's no fix for that. 


was (Author: apurtell):
bq. . If we introduce a config property something like 
hbase.mapreduce.ignore.empty.regions (make it expert level property) and 
default it to false so that we don't change any behavior, would that alleviate 
your concern ?

Not really.

If you have so many empty regions that this actually works, why aren't you 
using the region normalizer as is the recommended practice?

If the problem is a map or reduce task OOMEs because the representation of 
region locations is too large, then if you ignore "empty" regions but the table 
grows some more with non-empty regions, you haven't solved the root cause and 
will OOME again with no recourse. Why not raise memory for tasks? Why not try 
to optimize the memory usage of region location for splits? I.e. address the 
root cause, rather than put in a kludge. 

> Remove the empty regions from the hbase mapreduce splits
> --------------------------------------------------------
>
>                 Key: HBASE-24859
>                 URL: https://issues.apache.org/jira/browse/HBASE-24859
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Sandeep Pal
>            Assignee: Sandeep Pal
>            Priority: Major
>         Attachments: hbase-24859.png, screenshot-1.png
>
>
> It has been observed that when the table has too many regions, MR jobs 
> consume more memory in the client. This is because we keep the region level 
> information in memory and the memory heavy object is TableSplit because of 
> the Scan object as a part of it.
> We can optimize the memory consumption by not loading the region level 
> information if the region is empty based on the configuration.
> The default configuration can lead to all TableSplits in memory (no change 
> from the current), but the configuration can enable the map-reduce job to 
> ignore the empty regions. The configuration can be a part of MR job based. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-24859) Remove the empty regions from the hbase mapreduce splits

Reply via email to