32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes ----------------------------------------------------------------------------
Key: HBASE-2531 URL: https://issues.apache.org/jira/browse/HBASE-2531 Project: Hadoop HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.20.5, 0.21.0 Kannan tripped over two regionnames that hashed the same: Here is code demo'ing that his two names hash the same: {code} package org; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.hbase.util.JenkinsHash; public class Testing { public static void main(final String [] args) { System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167"))); System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201"))); } /** * @param regionName * @return the encodedName */ public static int encodeRegionName(final byte [] regionName) { return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0)); } } {code} Need new encoding mechanism. Will need to migrate old regions to new schema. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.