[ 
https://issues.apache.org/jira/browse/HBASE-25302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha updated HBASE-25302:
-------------------------------
    Description: 
In our company MeiTuan, we have encountered a problem, that is a production 
cluster with about more than 10000 regions switched off the split switch by 
some mistakes. And more than 10000 regions are larger than 600GB before we can 
do something to split them. Because the R+W QPS is higher than 80000 and if 
split all the large regions at a time means lots of IO and CPU resources are 
needed under current splitting method, expecially that all the regions need to 
split more than once.

Fortunately, we use stripe compaction in most of our production clusters, 
according to the design docs in 
https://issues.apache.org/jira/browse/HBASE-7667 ,  we have implemented a fast 
split region method using the idea of HFileLink and a hfile movement method 
between same table regions. Actually, this idea was mentioned in HBASE-7667 , 
it said that `region splits become marvelously simple (if we could move files 
between regions, no references would be needed)`.

This issue is point at the fast split method. 

Details are in the doc,

[https://docs.google.com/document/d/1hzBMdEFCckw18RE-kQQCe2ArW0MXhmLiiqyqpngItBM/edit?usp=sharing]

It is very simple and efficiency, we have implement all the ideas described in 
the design doc and used on our production clusters. A region of about 600G can  
be splitted to  75G*8 regions in about five minutes, with less than 5G total 
rewrite size(all are L0) in the whole process, while normal continuous split 
needs 600G*3=1800G. If using movement for same table HFileLinks, the rewritten 
size is less than 50G(two stripe size), because the rebuild of HFileLinks to 
stripes may insert some files to L0.

I will push two images about a RS before and after splitting all the regions 
using this method.

 

We are willing to contribute the codes to community. This idea can not only be 
used in stripe store engine, but also default store engine, and can be very 
benefit to merge regions.  If there is someone who has interest in this issue, 
please let me know, thanks. 

 

  was:
In our company MeiTuan, we have encountered a problem, that is a production 
cluster with about more than 10000 regions switched off the split switch by 
some mistakes. And more than 10000 regions are larger than 600GB before we can 
do something to split them. Because the R+W QPS is higher than 80000 and if 
split all the large regions at a time means lots of IO and CPU resources are 
needed under current splitting method, expecially that all the regions need to 
split more than once.

Fortunately, we use stripe compaction in most of our production clusters, 
according to the design docs in 
https://issues.apache.org/jira/browse/HBASE-7667 ,  we have implemented a fast 
split region method using the idea of HFileLink and a hfile movement method 
between same table regions. Actually, this idea was mentioned in HBASE-7667 , 
it said that `region splits become marvelously simple (if we could move files 
between regions, no references would be needed)`.

This issue is point at the fast split method. 

Details are in the doc,

[https://docs.google.com/document/d/1hzBMdEFCckw18RE-kQQCe2ArW0MXhmLiiqyqpngItBM/edit?usp=sharing]

It is very simple and efficiency, we have implement all the ideas described in 
the design doc and used on our production clusters. A region of about 600G can  
be splitted to  75G*8 regions in about five minutes, with less than 5G total 
rewrite size(all are L0) in the whole process, while normal continuous split 
needs 600G*3=1800G. If using movement for same table HFileLinks, the rewritten 
size is less than 50G(two stripe size), because the rebuild of HFileLinks to 
stripes may insert some files to L0.

I will push two images about a RS before and after splitting all the regions 
using this method.

 

We are willing to contribute the codes to community. This idea can be not only 
be used in stripe store engine, but also default store engine, and can be very 
benefit to merge regions.  If there is someone who has interest in this issue, 
please let me know, thanks. 

 


> Fast split regions with stripe store engine
> -------------------------------------------
>
>                 Key: HBASE-25302
>                 URL: https://issues.apache.org/jira/browse/HBASE-25302
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>         Attachments: after-split.png, before-split.png
>
>
> In our company MeiTuan, we have encountered a problem, that is a production 
> cluster with about more than 10000 regions switched off the split switch by 
> some mistakes. And more than 10000 regions are larger than 600GB before we 
> can do something to split them. Because the R+W QPS is higher than 80000 and 
> if split all the large regions at a time means lots of IO and CPU resources 
> are needed under current splitting method, expecially that all the regions 
> need to split more than once.
> Fortunately, we use stripe compaction in most of our production clusters, 
> according to the design docs in 
> https://issues.apache.org/jira/browse/HBASE-7667 ,  we have implemented a 
> fast split region method using the idea of HFileLink and a hfile movement 
> method between same table regions. Actually, this idea was mentioned in 
> HBASE-7667 , it said that `region splits become marvelously simple (if we 
> could move files between regions, no references would be needed)`.
> This issue is point at the fast split method. 
> Details are in the doc,
> [https://docs.google.com/document/d/1hzBMdEFCckw18RE-kQQCe2ArW0MXhmLiiqyqpngItBM/edit?usp=sharing]
> It is very simple and efficiency, we have implement all the ideas described 
> in the design doc and used on our production clusters. A region of about 600G 
> can  be splitted to  75G*8 regions in about five minutes, with less than 5G 
> total rewrite size(all are L0) in the whole process, while normal continuous 
> split needs 600G*3=1800G. If using movement for same table HFileLinks, the 
> rewritten size is less than 50G(two stripe size), because the rebuild of 
> HFileLinks to stripes may insert some files to L0.
> I will push two images about a RS before and after splitting all the regions 
> using this method.
>  
> We are willing to contribute the codes to community. This idea can not only 
> be used in stripe store engine, but also default store engine, and can be 
> very benefit to merge regions.  If there is someone who has interest in this 
> issue, please let me know, thanks. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to