[ 
https://issues.apache.org/jira/browse/HBASE-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898513#comment-13898513
 ] 

stack commented on HBASE-10501:
-------------------------------

The original motivation was split quickly, initially, so regions got farmed out 
across the cluster sooner so full cluster got in on the action faster than if 
say, you had to wait till one region hit a max size.

Was also hoping to keep the logic simple -- easy to understand -- and was 
trying to make it so you'd not have to touch this splitter going forward.

As you point out 13 splits to reach  max seems too many especially if many 
tables on the one server.  Suggestions for when to split the first time all 
seem fine -- just pick one I'd say.

What if we tripled rather than doubled?

> Make IncreasingToUpperBoundRegionSplitPolicy configurable
> ---------------------------------------------------------
>
>                 Key: HBASE-10501
>                 URL: https://issues.apache.org/jira/browse/HBASE-10501
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>
> During some (admittedly artificial) load testing we found a large amount 
> split activity, which we tracked down the 
> IncreasingToUpperBoundRegionSplitPolicy.
> The current logic is this (from the comments):
> "regions that are on this server that all are of the same table, squared, 
> times the region flush size OR the maximum region split size, whichever is 
> smaller"
> So with a flush size of 128mb and max file size of 20gb, we'd need 13 region 
> of the same table on an RS to reach the max size.
> With 10gb file sized it is still 9 regions of the same table.
> Considering that the number of regions that an RS can carry is limited and 
> there might be multiple tables, this should be more configurable.
> I think the squaring is smart and we do not need to change it.
> We could
> * Make the start size configurable and default it to the flush size
> * Add multiplier for the initial size, i.e. start with n * flushSize
> * Also change the default to start with 2*flush size
> Of course one can override the default split policy, but these seem like 
> simple tweaks.
> Or we could instead set the goal of how many regions of the same table would 
> need to be present in order to reach the max size. In that case we'd start 
> with maxSize/goal^2. So if max size is 20gb and the goal is three we'd start 
> with 20g/9 = 2.2g for the initial region size.
> [~stack], I'm especially interested in your opinion.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to