Balazs Meszaros created HBASE-30160:
---------------------------------------

             Summary: Prevent region creation if the encoded region names are 
the same
                 Key: HBASE-30160
                 URL: https://issues.apache.org/jira/browse/HBASE-30160
             Project: HBase
          Issue Type: Sub-task
            Reporter: Balazs Meszaros


HBase region names are hash like this: MD5(tableName,startKey,...). With a 
special startKey we can create collisions easily, like this:

{noformat}
hbase:001:0> create 'table1', 'f', SPLITS => 
["\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00^B\xb9\x99\xdb\xb7\x98W\xfa\xa1\xe0\xf1\xbc\x09h]1S[&u*\x93\xa1&RzF\x87\x9e\x970\x84\xe5\xb9\xe3ln*l\x07\x0c\xef\x03\x96Q\xbdC!\xb1\xdec-\xfb+\x11\x83h\xc1\xbe$\x1f\xae\x95\xaf\xd3W\x07\x8a\x01\xfa\xf1\xba\x83\x8c}\xa5A1\x83\xae\xae\xf8\xe6\xf9\xe5F\xa7\xc9\x1a\xfeM\xec\x07\xdem\x0em\x9e\x97\xf4\x16\x08\x94\xa8\x8a87\x07\xb5v\xac\xe7\x07\x10\x22\xfc\xb9\x1fm\xbd\x13V\xa9\xedX\xf0\xb1",
 
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00^B\xb9\x99\xdb\xb7\x98W\xfa\xa1\xe0\xf1\xbc\x09h]1S[\xa6u*\x93\xa1&RzF\x87\x9e\x970\x84\xe5\xb9\xe3ln*l\x07\x0c\xef\x03\x96\xd1\xbcC!\xb1\xdec-\xfb+\x11\x83h\xc1>$\x1f\xae\x95\xaf\xd3W\x07\x8a\x01\xfa\xf1\xba\x83\x8c}\xa5A1\x83\xae\xae\xf8f\xf9\xe5F\xa7\xc9\x1a\xfeM\xec\x07\xdem\x0em\x9e\x97\xf4\x16\x08\x94\xa8\x8a87\x075w\xac\xe7\x07\x10\x22\xfc\xb9\x1fm\xbd\x13V)\xedX\xf0\xb1"]

ERROR: The procedure 9 is still running

For usage try 'help "create"'

Took 608.8101 seconds
{noformat}

The table creation fails, because hashes are the same:

{noformat}
2026-05-13 09:34:23,762 INFO  org.apache.hadoop.hbase.regionserver.HRegion: 
[RegionOpenAndInit-table1-pool-2]: creating {ENCODED => 
647314dfe2b7e604e08fd7fd3fec44fc, NAME => 'table1,...
2026-05-13 09:34:23,764 INFO  org.apache.hadoop.hbase.regionserver.HRegion: 
[RegionOpenAndInit-table1-pool-1]: creating {ENCODED => 
647314dfe2b7e604e08fd7fd3fec44fc, NAME => 'table1,...
2026-05-13 09:34:23,772 WARN  org.apache.hadoop.hdfs.DataStreamer: 
[Thread-140]: DataStreamer Exception
java.io.FileNotFoundException: File does not exist: 
/hbase/data/default/table1/647314dfe2b7e604e08fd7fd3fec44fc/.regioninfo (inode 
16653) [Lease.  Holder: DFSClient_NONMAPREDUCE_1353520776_1, pending creates: 3]
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3194)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:609)
...
{noformat}

The procedure never finishes and prohibits further creation of {{table1}}.

This issue should be triggered with splitting the table twice:

{noformat}
split 'table1', 'malicious-key1'
split 'table1', 'malicious-key2'
{noformat}

It would be hard to change MD5 to something else, but we should handle these 
collisions better. We should check if the region hashes are the same and fail 
immediately. Under normal circumstances, the chance of a collision with 
automatic splitting is very-low.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to