[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116021#comment-13116021
 ] 

Jonathan Hsieh commented on HBASE-4489:
---------------------------------------

A few thoughts:

I agree with jgray -- I think one fix should correct the MD5 string split so 
that it splits from 0x00.. 0xff.  I think there could be another separate patch 
that adds the UniformSplit.  

I'd be wary of changing the default, especially if this is means to go into a 
0.90.x branch.  It looks like as a user you can add and use the UniformSplit by 
changing the conf option. 

Ideally patches with new functionality or changing semantics would also 
introduce corresponding tests.  There were no tests on the previous code, and 
no tests in on the newly introduced code.  Adding tests especially around edge 
cases could accommodate Ted's concerns, and it doesn't really hurt to be extra 
defensive when coding on non-performance sensitive code.


                
> Better key splitting in RegionSplitter
> --------------------------------------
>
>                 Key: HBASE-4489
>                 URL: https://issues.apache.org/jira/browse/HBASE-4489
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.4
>            Reporter: Dave Revell
>            Assignee: Dave Revell
>         Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "00000000" to ASCII string "7FFFFFFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66...., \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to