[ 
https://issues.apache.org/jira/browse/HBASE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873192#action_12873192
 ] 

HBase Review Board commented on HBASE-2531:
-------------------------------------------

Message from: "Kannan Muthukkaruppan" <[email protected]>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/104/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

The new format for a region name contains its encodedName. The encoded name 
also serves as the directory name for the region in the filesystem.

New region name format:

      <tablename>,<startkey>,<regionIdTimestamp>/<encodedName>/

where, <encodedName> is a hex version of the MD5 hash of 
<tablename>,<startkey>,<regionIdTimestamp>
 
The old region name format remains:
     <tablename>,<startkey>,<regionIdTimestamp>

For region names in the old format, the encoded name is a 32-bit JenkinsHash 
integer value (in its decimal notation, string form). 

**NOTE**
  
ROOT, the first META region, and regions created by an older version of HBase 
(0.20 or prior) will continue to use the old region name format.


In the logs & web ui, old format region names will show up as:
   <tablename>,<startkey>,<regionIdTimestamp>(<jenkinshashEncodedName>)
New format region names will show up as:
    <tablename>,<startkey>,<regionIdTimestamp>/<md5hashEncodedName>/


This addresses bug HBASE-2531.


Diffs
-----

  trunk/bin/add_table.rb 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
949322 
  trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 949322 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 949322 
  trunk/src/main/resources/hbase-webapps/master/table.jsp 949322 
  trunk/src/main/resources/hbase-webapps/regionserver/regionserver.jsp 949322 
  trunk/src/test/java/org/apache/hadoop/hbase/TestEmptyMetaInfo.java 949322 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 
949322 
  
trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestGetClosestAtOrBefore.java
 949322 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java 
949322 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 
949322 

Diff: http://review.hbase.org/r/104/diff


Testing
-------

unit tests pass. ran some 


Thanks,

Kannan




> 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2531
>                 URL: https://issues.apache.org/jira/browse/HBASE-2531
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Kannan Muthukkaruppan
>            Priority: Blocker
>             Fix For: 0.21.0
>
>
> Kannan tripped over two regionnames that hashed the same:
> Here is code demo'ing that his two names hash the same:
> {code}
> package org;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.hbase.util.JenkinsHash;
> public class Testing {
>   public static void main(final String [] args) {
>     
> System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
>     
> System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
>   }
>   /**
>    * @param regionName
>    * @return the encodedName
>    */
>   public static int encodeRegionName(final byte [] regionName) {
>     return Math.abs(JenkinsHash.getInstance().hash(regionName, 
> regionName.length, 0));
>   }
> }
> {code}
> Need new encoding mechanism.  Will need to migrate old regions to new schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to