[
https://issues.apache.org/jira/browse/HBASE-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
He Liangliang updated HBASE-12452:
----------------------------------
Release Note:
A custom RegionSplitPolicy implementing a SplitPolicy that groups rows by a
prefix of the
row-key. The prefix is chosen according to regular expression match.
This ensures that a region is not split "inside" a prefix of a row key. I.e.
rows can be
co-located in a region by their prefix.
As an example, if you have row keys formatted as
<code>salt_userid_enventtype_eventid</code>,
and you want to split rows between <code>userids</code>. This split policy can
be
represents as a regex like <code>^[^_]+_[^_]+_</code> (suppose all parts are
non empty and
does not contain '_'). For the regex string, ISO-8859-1 character set is used
so any byte array
can be supported, for example, <code>^[^\x00]+\x00[^\x00]+\x00</code> split
after the second
<code>\x00</code> character.
> Add regular expression based split policy
> -----------------------------------------
>
> Key: HBASE-12452
> URL: https://issues.apache.org/jira/browse/HBASE-12452
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: He Liangliang
> Assignee: He Liangliang
> Priority: Minor
> Attachments: HBASE-12452-V2.patch, HBASE-12452-V2.patch,
> HBASE-12452-V3.patch, HBASE-12452-V3.patch, HBASE-12452-V4.patch,
> HBASE-12452.patch
>
>
> The current DelimitedKeyPrefixRegionSplitPolicy split policy is not flexible
> enough to describe the split point prefix in some case. A regex based split
> policy is proposed, for example:
> ^[^\x00]+\x00[^\x00]+\x00
> means the split point will always be truncated to a prefix at the second \0
> char.
> The binary string support is quite useful when the rowkey encoded by a common
> data access library instead of concatenated manually by application developer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)