[jira] [Updated] (HBASE-12452) Add regular expression based split policy

He Liangliang (JIRA) Thu, 21 May 2015 02:51:38 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


He Liangliang updated HBASE-12452:
----------------------------------
    Release Note: 
A custom RegionSplitPolicy implementing a SplitPolicy that groups rows by a 
prefix of the
row-key. The prefix is chosen according to regular expression match.

This ensures that a region is not split "inside" a prefix of a row key. I.e. 
rows can be
co-located in a region by their prefix.

As an example, if you have row keys formatted as 
<code>salt_userid_enventtype_eventid</code>,
and you want to split rows between <code>userids</code>. This split policy can 
be
represents as a regex like <code>^[^_]+_[^_]+_</code> (suppose all parts are 
non empty and
does not contain '_'). For the regex string, ISO-8859-1 character set is used 
so any byte array
can be supported, for example, <code>^[^\x00]+\x00[^\x00]+\x00</code> split 
after the second
<code>\x00</code> character.

> Add regular expression based split policy
> -----------------------------------------
>
>                 Key: HBASE-12452
>                 URL: https://issues.apache.org/jira/browse/HBASE-12452
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: He Liangliang
>            Assignee: He Liangliang
>            Priority: Minor
>         Attachments: HBASE-12452-V2.patch, HBASE-12452-V2.patch, 
> HBASE-12452-V3.patch, HBASE-12452-V3.patch, HBASE-12452-V4.patch, 
> HBASE-12452.patch
>
>
> The current DelimitedKeyPrefixRegionSplitPolicy split policy is not flexible 
> enough to describe the split point prefix in some case. A regex based split 
> policy is proposed, for example:
>     ^[^\x00]+\x00[^\x00]+\x00
> means the split point will always be truncated to a prefix at the second \0 
> char.
> The binary string support is quite useful when the rowkey encoded by a common 
> data access library instead of concatenated manually by application developer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12452) Add regular expression based split policy

Reply via email to