[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META

stack (Commented) (JIRA) Wed, 21 Dec 2011 16:13:59 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174518#comment-13174518
 ]


stack commented on HBASE-4616:
------------------------------

Regards UUID, if its an md5 anyways, lets call it what it is.
I tried to see if a hash that was smaller in size but it seems
that even murmur hash v3 is 120 bits now. If md5, can get hash
by doing 'md5sum "tablename"' on command line if have to; is
there a uuid command-line tool so readily available?

I was thinking too we could get base64 of the md5 so size in meta is less
but thats probably more pain that its worth.

Or, what would it take to keep a file of tablenames to int codes in
the filesystem and use the zero-padded int code in .META. instead?
All tables under value 10 would be catalog tables... .META. would
be 00000, etc.  Could sit beside the hbase.version file at head of the
filesystem; could call the file .hbase.tablenames.map or some such.
UUID/md5 is nice in that it makes it so don't need this but they
are wide compared to a short.  Maybe UUID/md5 the way to go -- less
complexity and the width is not really concern since we don't write
tablename into keys (unless we are replicating I suppose; hmm... maybe
UUID/md5 is safer because two clusters could have same code for different
table names -- that'd get really confusing... if you stay w/ md5s, you
have the same behavior we have now regards cross-dc and table names).

I think we need to make a list of pros/cons removing tablename from meta still.

In src/main/java/org/apache/hadoop/hbase/HRegionInfo.java why we import
Using MetaReader inside in HRI is not on I'd say.  Thats a base hbase
class using a subpackage class that does client operations... HRI should
be usable w/o need of a cluster/network.  Ditto CatalogTracker.

Dont do this.. Expand it: +import org.apache.hadoop.hbase.util.*;

How is this new format going t play w/ old format?  We have to convert
all the old formats as part of migration into new format?  What about
the fact that the dir in the filesystem that holds region data is named
for a hash of the HRI name?  We can't rename all dirs in fs as part of
migration?  Or you think we should?

You could add some javadoc here:

    * New region name format:
-   *    &lt;tablename>,,&lt;startkey>,&lt;regionIdTimestamp>.&lt;encodedName>.
+   *    &lt;tablename>,,&lt;endkey>,&lt;regionIdTimestamp>.&lt;encodedName>.
    * where,
    *    &lt;encodedName> is a hex version of the MD5 hash of
-   *    &lt;tablename>,&lt;startkey>,&lt;regionIdTimestamp>
+   *    &lt;tablename>,&lt;endkey>,&lt;regionIdTimestamp>

Isn't tablename the md5 of tablename now?

What is this comment about?

+  // It should say, the tablename encoded in the region ends with 0x01,
+  // but the last region's tablename ends with 0x02
+  public static final int END_OF_TABLE_NAME = 1;
+  public static final int END_OF_TABLE_NAME_FOR_EMPTY_ENDKEY = 
END_OF_TABLE_NAME + 1;

Do you think these should be printable characters?  Else, they'll show in 
terminal as a box or something?  For example, if this end-of-tablename marker 
were a '.', and the last table had a ':', that'd sort properly and it wouldn't 
look so bad when you listed regions (it would be confusing having '.' for every 
region name but the last but less confusing that two unprintables that rendor 
one way for 1 and another for 2).  Or you could have comma as end char and a 
period for the last region in a table (that makes a bit more 'sense')

Lines like this are too long:


+    this.regionName = createRegionName(this.tableName, startKey, endKey, 
Long.toString(regionId).getBytes(), true);

Fix your formatting in places like below:

+                                         boolean newFormat){
+      // We need to be able to add a single byte easily to the regionName as 
we are building it up
+      // It's being used by appending delimiters and special markers.
+
+    byte [] oneByte = new byte[1];
+    //This cannot return any weird string chars as all uuid chars are a hex 
char.

The first comment is too indented.  There is a space between close paren and 
open of curly brace on function, etc.

Use Bytes.toBytes instead of below if only for fact that its shorter statement:

+    byte[] uuidTableName = 
UUID.nameUUIDFromBytes(tableName).toString().getBytes();

If null, throw exception?  Don't keep going?

+    int allocation = uuidTableName ==  null ? 2 : uuidTableName.length + 2;$

Explain why you are adding '2' in a comment?

Why write +    allocation += endKey == null ? 1 : endKey.length + 1; so?

Why not as if (endKey == null) allocation += endKey.length;
and then allocation += 1; // COMMENT EXPLAINING WHY THE +1?

Yeah, if some of these items are null like id or tablename, it should be error?

This don't seem necessary, the oneByte that is:

+      oneByte[0] = END_OF_TABLE_NAME_FOR_EMPTY_ENDKEY;$

Just put the END_OF_TABLE_NAME_FOR_EMPTY_ENDKEY into the  
byteArrayDataOutput.put(oneByte);

Ditto DELIMITER

In fact, why we need DELIMITER when this is all fixed-length?  Can't we drop 
DELIMITER here especially if we change the END_OF_TABLE_NAME thingy to be 
printable?

I don't think that we can get away with a memcmp comparator with this HRI name. 
  The endrow name will be variable length so we'll still need to do special 
comparator that looks for the DELIMITER here on the end and compares id and end 
row distinctly.  The comparator does get cleaner in that we can memcmp up to 
the last instance of DELIMITER and then do id compare (-ROOT- will be messier 
but -ROOT- should be going away and currenlty only one region in .META. anyways 
so shouldn't worry too much about it).

This should be including the DELIMITER?

+        md5HashBytes =  MD5Hash.getMD5AsHex(metaKey).getBytes();$

And you don't need the oneByte stuff; just use DELIMITE same for ENC_SEPARATORR.

This commen is odd.

+   * Get the tableName associated with that region. It only supports$
+   * user regions, not the meta one.$

Why can't we get meta regions?  Those names are well-known; if you recognize 
it, just return the meta name?

This method:

+  public static byte [] getTableName(byte[] regionName, Configuration conf)$

This is ugly, but maybe its ok?  You've deprecated it (but you need to point 
user elsewhere when you deprecate something).  What you thinking?  That it 
should
just not be possible to get tablename from an HRI going forward?  This is kinda 
ugly method anyways... no longer used I'd say so fine if its going away (I seem 
to remember it was written for some strange context where needed to find 
tablename walking filesystem but no cluster up).

Ditto for the version that don't take a Configuration... but man, thats ugly 
creating a Configuration inside in the method.  Needs WARNINGS in javadoc: 
"DON"T DO IT!!!!!"


Follow formatting of method opening as is done elsewhere in this file; the 
below is different:

+  private static boolean isLHSSplit(final byte[] regionName,$
+                                    final int offset) throws IOException {$

The above method needs a bit of doc on what its about.  And there should be a 
blank line after each method ... there is none here.

The below method does not look right and should be deprecated anyways (Its 
returning region name?  At least warn it don't work any more)?

   /**$
-   * Get current table name of the region$
+   * Get the tablename of the region from the regionKey$
    * @return byte array of table name$
    */$
   public byte[] getTableName() {$
     if (tableName == null || tableName.length == 0) {$
-      tableName = getTableName(getRegionName());$
+      tableName = getRegionName();$
     }$
     return tableName;$
   }$

The below 'table name' pollution does not belong in KV.  Better in 
HTableDescriptor?

+  private static String UUIDMetaTableKey = 
UUID.nameUUIDFromBytes(Bytes.toBytes(".META.")).toString();

Besides ain't this a constant length?  Why bother calc it?

Why we doing this calc anyways?  Ain't all table lengths now the same whether 
they are catalog (meta) tables or not?


The below is a regression that your patch shouldn't be letting in:


-import org.apache.hadoop.hbase.client.Get;
-import org.apache.hadoop.hbase.client.HTable;
-import org.apache.hadoop.hbase.client.Result;
-import org.apache.hadoop.hbase.client.ResultScanner;
-import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.client.*;


THis don't look right (variable name is wrong or something):

 -    HRegionLocation firstMetaServer = getFirstMetaServerForTable(tableName);$
 +    HRegionLocation firstMetaServer = getLastMetaServerForTable(tableName);$

Do we still need this nines stuff?


 704 -      HRegionInfo.createRegionName(tableName, null, HConstants.NINES, 
false));$
 705 +      HRegionInfo.createRegionName(tableName, null, null, 
HConstants.NINES.getBytes(), false));$

We know what end region in a table is called?

Why you have to hardcode?  Can you not pass tableName here like it used to?

 727 -        return locateRegionInMeta(HConstants.ROOT_TABLE_NAME, tableName, 
row,$
 728 -            useCache, metaRegionLock);$
 729 +$
 730 +        //HARD CODED TO POINT TO THE FIRST META TABLE$
 731 +        return locateRegionInMeta(HConstants.ROOT_TABLE_NAME,$
 732 +                                  HConstants.META_TABLE_NAME,$
 733 +                                  HConstants.EMPTY_BYTE_ARRAY,$
 734 +                                  useCache,$
 735 +                                  metaRegionLock);$

This exception is not correct for the condition found?

 762 +      if (result == null) {$
 763 +        throw new TableNotFoundException(Bytes.toString(tableName));$
 764 +      }$

This is nice:

 820 -      byte [] metaKey = HRegionInfo.createRegionName(tableName, row,$
 821 -        HConstants.NINES, false);$
 822 +      final byte [] metaKey = MetaSearchRow.getStartRow(tableName, row);$


Getting rid of that wacky 9s stuff.

Are we sure second region is not also offline or split -- or nvm, I see you 
doing check later?

 899 +              regionInfoRow = result[0];$
 900 +              regionInfo =  resultToHRegionInfo(regionInfoRow,$
 901 +                                                   tableName,$
 902 +                                                   parentTable);$
 903 +              if (regionInfo.isOffline()) {$
 904 +                regionInfoRow = result[1];$
 905 +                regionInfo =  resultToHRegionInfo(regionInfoRow,$
 906 +                        tableName,$
 907 +                        parentTable);$

Should we scan forward more than two rows and put rest into cache (we had 
something doing this in current code)

CHange in HTable is white space only.

MetaSearchRow is utilities for doing searches on meta table?  Seems more 
generic than that?  Maybe methods belong in HTableDescriptor as utility methods 
or maybe
even in Scan?

We need this nines stuff in below?

1117 +  public static byte[] getStopRow(final byte[] tableName) {$
1118 +    final byte[] uuidTableName = 
UUID.nameUUIDFromBytes(tableName).toString().getBytes();$
1119 +    byte[] b = new byte[uuidTableName.length + 3 + 
HConstants.NINES.getBytes().length];$
1120 +    int offset = uuidTableName.length;$
1121 +    System.arraycopy(uuidTableName, 0, b, 0, offset);$
1122 +$
1123 +    b[offset++] = (byte) (HRegionInfo.END_OF_TABLE_NAME_FOR_EMPTY_ENDKEY 
+ 1);$
1124 +    b[offset++] = HRegionInfo.DELIMITER;$
1125 +    b[offset++] = HRegionInfo.DELIMITER;$
1126 +    System.arraycopy(HConstants.NINES.getBytes(), 0, b, offset, 
HConstants.NINES.getBytes().length);$
1127 +    return b;$
1128 +  }$

We know the 'stop row' is tablename uuid'd + 0x02 + empty row.  If we get a row 
> than this one, we are into new table?

Do we need the zeros anymore in the below?  Ain't the region we want the first 
one we bump into if specify tablename+0x01+row?

1155 +  public static byte[] getStartRow(final byte[] tableName, final byte[] 
searchRow) {$
1156 +    // Get first region in META$
1157 +    final byte[] uuidTableName = 
UUID.nameUUIDFromBytes(tableName).toString().getBytes();$
1158 +$
1159 +    if (searchRow == null || searchRow.length == 0) {$
1160 +      byte[] startRow = new byte[uuidTableName.length + 3 + 
HConstants.ZEROES.getBytes().length];$
1161 +      System.arraycopy(uuidTableName, 0, startRow, 0, 
uuidTableName.length);$
1162 +      startRow[uuidTableName.length] = HRegionInfo.END_OF_TABLE_NAME - 1;$
1163 +      startRow[uuidTableName.length + 1] = HRegionInfo.DELIMITER;$
1164 +      startRow[uuidTableName.length + 2] = HRegionInfo.DELIMITER;$
1165 +      System.arraycopy(HConstants.ZEROES.getBytes(), 0, startRow, 
uuidTableName.length + 3, HConstants.ZEROES.getBytes().length);$
1166 +$
1167 +      return startRow;$
1168 +    }$

For new files, you do not need this line:

+ * Copyright 2007 The Apache Software Foundation


I don't think the below test is necessary -- especially if it means spinning up 
a cluster:

src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfoGetTableName.java
                
> Update hregion encoded name to reduce logic and prevent region collisions in 
> META
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-4616
>                 URL: https://issues.apache.org/jira/browse/HBASE-4616
>             Project: HBase
>          Issue Type: Umbrella
>            Reporter: Alex Newman
>            Assignee: Alex Newman
>         Attachments: HBASE-4616-v2.patch, HBASE-4616-v3.patch, 
> HBASE-4616.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META

Reply via email to