[ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215080#comment-13215080
 ] 

Jean-Daniel Cryans commented on HBASE-4365:
-------------------------------------------

Conclusion for the 1TB upload:

Flush size: 512MB
Split size: 20GB

Without patch:
18012s

With patch:
12505s

It's 1.44x better, so a huge improvement. The difference here is due to the 
fact that it takes an awfully long time to split the first few regions without 
the patch. In the past I was starting the test with a smaller split size and 
then once I got a good distribution I was doing an online alter to set it to 
20GB. Not anymore with this patch :)

Another observation: the upload in general is slowed down by "too many store 
files" blocking. I could trace this to compactions taking a long time to get 
rid of reference files (3.5GB taking more than 10 minutes) and during that time 
you can hit the block multiple times. We really ought to see how we can 
optimize the compactions, consider compacting those big files in many threads 
instead of only one, and enable referencing reference files to skip some 
compactions altogether.
                
> Add a decent heuristic for region size
> --------------------------------------
>
>                 Key: HBASE-4365
>                 URL: https://issues.apache.org/jira/browse/HBASE-4365
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.1, 0.94.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>              Labels: usability
>         Attachments: 4365-v2.txt, 4365.txt
>
>
> A few of us were brainstorming this morning about what the default region 
> size should be. There were a few general points made:
> - in some ways it's better to be too-large than too-small, since you can 
> always split a table further, but you can't merge regions currently
> - with HFile v2 and multithreaded compactions there are fewer reasons to 
> avoid very-large regions (10GB+)
> - for small tables you may want a small region size just so you can 
> distribute load better across a cluster
> - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to