[
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211619#comment-13211619
]
Lars Hofhansl commented on HBASE-4365:
--------------------------------------
Wouldn't we potentially do a lot of splitting when there are many regionservers?
(Maybe I am not grokking this fully)
If we take the square of the of the number of regions, and say we have 10gb
regions and flush size of 128mb, we'd read the 10gb after at 9 regions of the
table on the same regionserver.
We were planning a region size of 5gb and flush size of 256mb, that would still
be 5 regions.
(10gb/128mb ~ 78, 5gb/256mb ~ 19)
> Add a decent heuristic for region size
> --------------------------------------
>
> Key: HBASE-4365
> URL: https://issues.apache.org/jira/browse/HBASE-4365
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.94.0, 0.92.1
> Reporter: Todd Lipcon
> Priority: Critical
> Labels: usability
> Attachments: 4365.txt
>
>
> A few of us were brainstorming this morning about what the default region
> size should be. There were a few general points made:
> - in some ways it's better to be too-large than too-small, since you can
> always split a table further, but you can't merge regions currently
> - with HFile v2 and multithreaded compactions there are fewer reasons to
> avoid very-large regions (10GB+)
> - for small tables you may want a small region size just so you can
> distribute load better across a cluster
> - for big tables, multi-GB is probably best
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira