[ 
https://issues.apache.org/jira/browse/HBASE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039505#comment-13039505
 ] 

stack commented on HBASE-3922:
------------------------------

This looks like a useful addition.  Here are some comments on the patch:

Please do not make unorthodox changes.  You move the package definition from 
after the license to before (Maybe your IDE did this?)  In general, follow the 
convention of the code that is already there.  For example, you should notice 
that this class has two spaces for a tab.  You should do the same for example 
here '+  void setRangeString(String minKey, String maxKey);'

I'd suggest that this is a little confusing:

{code}
+    opt.addOption(OptionBuilder.withArgName("startKey,endKey").hasArg()
+            .withDescription("default is 000000-ffffff").create("k"));
{code}

The long form is 'startKey,endKey'?  But in the description you have a hypen 
delimiting the values.  This is a little confusing.  You should be consistent.  
Also, isn't a comma or a hyphen ascii?  Would it be better to break this option 
in two, into startkey and endkey?

Is this a good thing to do?

{code}
-        "split.algorithm", MD5StringSplit.class, SplitAlgorithm.class);
+        "split.algorithm", ASCIISplit.class, SplitAlgorithm.class);
{code}

You change how the splits are done.  Shouldn't this too be an option; i.e. do 
ascii split as opposed to md5string split?

Why strip the final from the below?

{code}
-    final static String MAXMD5 = "7FFFFFFF";
-    final static BigInteger MAXMD5_INT = new BigInteger(MAXMD5, 16);
-    final static int rowComparisonLength = MAXMD5.length();
+    static String MAXMD5 = "7FFFFFFF";
+    static String MINMD5 = "00000000";
+    static BigInteger MAXMD5_INT = new BigInteger(MAXMD5, 16);
{code}

You want to use the values in more than the MD5StringSplit class?  I'd suggest 
you move them out of this class then and into the hosting class?

These looks like they should be final statics:

{code}
+         String sascii = 
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
+         char[] aa = new char[sascii.length()];
+         int[] bb = new int[sascii.length()];
{code}

It looks like you have copy pasted the class comment from md5splitstring and 
you've put it on asciisplit class?  See below.  I would suggest you add a 
description about what your ascii class is doing.

{code}
+  /**
+   * MD5StringSplit is the default {@link SplitAlgorithm} for creating 
pre-split
+   * tables. The format of MD5StringSplit is the ASCII representation of an MD5
+   * checksum. Row are long values in the range <b>"00000000" => "7FFFFFFF"</b>
+   * and are left-padded with zeros to keep the same order lexographically as 
if
+   * they were binary.
+   */
{code}

Why in the ascii class are we talking about md5s?  E.g.

{code}
+    public void setRangeString(String minKey, String maxKey){
+       MAXMD5 = maxKey;
+       MINMD5 = minKey;
...
{code}

The indentation in this new ascii class is nothing like that of the surrounding 
class when we get into this split function.

I didn't look too closely at the code.  This does seem like something you could 
write a unit test for.  The unit test would help you catch the corner cases 
before this class gets committed.

Thanks.

> pre sharding in ascii
> ---------------------
>
>                 Key: HBASE-3922
>                 URL: https://issues.apache.org/jira/browse/HBASE-3922
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, scripts, util
>    Affects Versions: 0.90.2
>            Reporter: mingjian
>            Priority: Minor
>         Attachments: RegionSplit.patch
>
>
> Some times we need pre-sharding some tables. With the default MD5StringSplit, 
> it can only split with BigInteger. Maybe we need split with ascii, especially 
> with [0-9,a-z,A-Z].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to