[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125118#comment-13125118
 ] 

Jonathan Hsieh commented on HBASE-4489:
---------------------------------------

@Dave

Part of me really just would prefer decouple rollingSplit from the  presplit 
min/max value selection -- maybe change this in to two programs -- a custom 
presplit table generator program that handles key bounds, and a separate 
rollingSplit program that just splits based on given key ranges.

I thought that there was agreement that we would keep MD5StringSplit as default 
for 0.90.  It looks like the default was changed to UniformSplit from 
MD5StringSplit in both patches.   While I generally agree with your point #3, 
it is a in 0.90 and would be a compatibility problem for anyone who depends on 
it.   Would it make sense to change the default in trunk/0.92 (I'm fine with 
that) but leave 0.90.x as is?    

Nice functional test.  Did you consider just doing a unit test on the split 
algorithm along with the cluster spinning functional test?  I believe 
HBaseAdmin.create(HTableDescriptor htd,byte startKeys[][]) is well tested and 
would make the non @Ignored portions quicker.  I can see how you need this 
setup for testing rollingSplit.

Interesting div 0 bug.  More testing, less surprises!

Any reason why in testCreatePressplitTable you go to -0x71, 0x81 .. -0x11 
instead of just going to 0x8f, 0x9f .. 0xff?  Though more verbose,  I think it 
is easier to read and follow if you use "positive" hex and cast all of them 
with (byte), or write out single longs and convert?


                
> Better key splitting in RegionSplitter
> --------------------------------------
>
>                 Key: HBASE-4489
>                 URL: https://issues.apache.org/jira/browse/HBASE-4489
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.4
>            Reporter: Dave Revell
>            Assignee: Dave Revell
>         Attachments: HBASE-4489-branch0.90-v1.patch, 
> HBASE-4489-branch0.90-v2.patch, HBASE-4489-trunk-v1.patch, 
> HBASE-4489-trunk-v2.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "00000000" to ASCII string "7FFFFFFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66...., \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to