This is why I created HBASE-12853. 

So you don’t have to specify a custom split policy. 

Of course the simple solutions are often passed over because of NIH.  ;-) 

To be blunt… You encapsulate the bucketing code so that you have a single API 
in to HBase regardless of the type of storage underneath. 
KISS is maintained and you stop people from attempting to do stupid things.   
(cc’ing dev@hbase) As a product owner, (read PMC / committers) you want to keep 
people from mucking about in the internals.  While its true that its open 
source, and you will have some who want to muck around, you also have to 
consider the corporate users who need something that is reliable and less 
customized so that its supportable.  This is the vendor’s dilemma. (hint 
Cloudera , Horton, IBM, MapR)  You’re selling support to HBase and if a 
customer starts to overload internals with their own code, good luck in 
supporting it.  This is why you do things like 12853 because it makes your life 
easier. 

This isn’t a sexy solution. Its core engineering work. 

HTH

-Mike

> On May 22, 2015, at 4:22 AM, Shushant Arora <[email protected]> wrote:
> 
> since custom split policy is based on second part i.e guid so key with
> first part as 2015-05-22 00:01:02 will be in which region how will that be
> identified?
> 
> 
> On Fri, May 22, 2015 at 1:12 PM, Ted Yu <[email protected]> wrote:
> 
>> The custom split policy needs to respect the fact that timestamp is the
>> leading part of the rowkey.
>> 
>> This would avoid the overlap you mentioned.
>> 
>> Cheers
>> 
>> 
>> 
>>> On May 21, 2015, at 11:55 PM, Shushant Arora <[email protected]>
>> wrote:
>>> 
>>> guid change with every key, patterns is
>>> 2015-05-22 00:02:01#AB12EC77778888945
>>> 2015-05-22 00:02:02#CD9870001234AB457
>>> 
>>> When we specify custom split algorithm , it may happen that keys of same
>>> sorting order range say (1-7) lies in region R1 as well as in region R2?
>>> Then how .META. table will make further lookups at read time,  say I
>> search
>>> for key 3, then will it search in both the regions R1 and R2 ?
>>> 
>>>> On Fri, May 22, 2015 at 10:48 AM, Ted Yu <[email protected]> wrote:
>>>> 
>>>> Does guid change with every key ?
>>>> 
>>>> bq. use second part of key
>>>> 
>>>> I don't think so. Suppose first row in the parent region is
>>>> '1432104178817#321'. After split, the first row in first daughter region
>>>> would still be '1432104178817#321'. Right ?
>>>> 
>>>> Cheers
>>>> 
>>>> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora <
>> [email protected]
>>>> wrote:
>>>> 
>>>>> Can I avoid hotspot of region with custom region split policy in hbase
>>>>>> 0.96 .
>>>>> 
>>>>> Key is of the form timestamp#guid.
>>>>> So can I have custom region split policy and use second part of key
>> (i.e)
>>>>> guid as region split criteria and avoid hot spot??
>>>> 
>> 

Reply via email to