Pankaj Kumar commented on HBASE-9081:

Thanks [~jeason] for raising this Jira. 

Multiple split points (pre-split) can be defined only at table creation and 
thereafter region splits only into two daughter regions either manually (using 
HBaseAdmin APIs) or automatically (based on the split policy). Currently there 
is no way to split a region into multiple daughter regions, user need to send 
multiple RPCs to retrieve table regions and send split request.

Based on the customer experiences, there is a need of multiple split of region 
in a single operation. We can say "Region Multi Split" instead of "Online split 
for an reserved empty region".

There can be multiple scenario where multi split is very much useful,
1) In the beginning user can't predict the incoming data behavior, so create 
the table with default region (without pre-split). After some data load into 
the table, user can predict the data distribution and define the split points 
efficiently. But currently to split the region into multiple regions (let say 
500) is not easy with existing APIs. User has to retrieve and split the region 
multiple times.

2) In case where the incoming data rate is too high, with current region split 
(2 daughter regions), multiple times splits is going to happen which will cause 
lot of I/O and cpu resources till it reaches to its desired number of regions 
(let say 500). But with the new feature, directly region can be split into the 
desirable number of regions in single operation.

Let me know your thought over this, will attach the design doc soon.

> Online split for an reserved empty region
> -----------------------------------------
>                 Key: HBASE-9081
>                 URL: https://issues.apache.org/jira/browse/HBASE-9081
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>            Priority: Major
> We already have a region splitter tool. But it can only provide limited 
> functions:
> 1. Create table with a specified region number without give any splits.
> 2. Roll-Split on an exist region.
> We have such user scenario: 
> Table was created with splits like below: 
> a____b____c____d____e____f____g____o
> g~o is a reserved empty region. Will use it only after some days. So we don't 
> know the rowkey distribution currently. Will split it only when it get used.
> Say, we want to split g~o with 10 new regions, likes g, g1, g2, g3, g4, 
> g5.......,g9, o.
> I didn't find similar function has already been there. Please tell me if I am 
> wrong.
> Hope to hear your ideas on this:)

This message was sent by Atlassian JIRA

Reply via email to