[
https://issues.apache.org/jira/browse/HBASE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868980#action_12868980
]
stack commented on HBASE-2531:
------------------------------
Yes to Todd suggestion.
Kannan, I'm down w/ your suggesion except for bit where ',' is also the
delimiter between timestamp and dirname. Use a '.' or something instead.
Special meta region comparator code looks for the ',' characters dividing up
the parts of a meta key doing sorting. The extra ',' will throw it off and
you'll get a headache trying to sort out how this comparator works. it gets
really interesting when meta splits. (though currently this is disabled)....
for then you have meta regionnames that look like this:
meta,TestTable,SOMESTARTKEY,TS,TS... then throw in fact that starkeys can be
binary and my sense is that about now you feel a migrane coming on.
I'm good w/ md5. 128 bits vs 160 bits for sha-1 (which seems overkill). Or we
could keep jenkins hash -- 32 bits -- because and use timestamp+jenkins_hash
naming dir. A collision is unlikely?
> 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
> ----------------------------------------------------------------------------
>
> Key: HBASE-2531
> URL: https://issues.apache.org/jira/browse/HBASE-2531
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: stack
> Priority: Blocker
> Fix For: 0.21.0
>
>
> Kannan tripped over two regionnames that hashed the same:
> Here is code demo'ing that his two names hash the same:
> {code}
> package org;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.hbase.util.JenkinsHash;
> public class Testing {
> public static void main(final String [] args) {
>
> System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
>
> System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
> }
> /**
> * @param regionName
> * @return the encodedName
> */
> public static int encodeRegionName(final byte [] regionName) {
> return Math.abs(JenkinsHash.getInstance().hash(regionName,
> regionName.length, 0));
> }
> }
> {code}
> Need new encoding mechanism. Will need to migrate old regions to new schema.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.