Jean-Daniel Cryans created HBASE-6310:
-----------------------------------------
Summary: -ROOT- corruption when .META. is using the old encoding
scheme
Key: HBASE-6310
URL: https://issues.apache.org/jira/browse/HBASE-6310
Project: HBase
Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Jean-Daniel Cryans
Priority: Blocker
Fix For: 0.96.0, 0.94.2
We're still working the on the root cause here, but after the leap second
armageddon we had a hard time getting our 0.94 cluster back up. This is what we
saw in the logs until the master died by itself:
{noformat}
2012-07-01 23:01:52,149 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
locateRegionInMeta parentTable=-ROOT-,
metaLocation={region=-ROOT-,,0.70236052, hostname=sfor3s28,
port=10304}, attempt=16 of 100 failed; retrying after sleep of 32000
because: HRegionInfo was null or empty in -ROOT-,
row=keyvalues={.META.,,1259448304806/info:server/1341124914705/Put/vlen=14/ts=0,
.META.,,1259448304806/info:serverstartcode/1341124914705/Put/vlen=8/ts=0}
{noformat}
(it's strage that we retry this)
This was really misleading because I could see the regioninfo in a scan:
{noformat}
hbase(main):002:0> scan '-ROOT-'
ROW COLUMN+CELL
.META.,,1 column=info:regioninfo,
timestamp=1331755381142, value={NAME => '.META.,,1', STARTKEY => '',
ENDKEY => '', ENCODED => 1028785192,}
.META.,,1 column=info:server,
timestamp=1341183448693, value=sfor3s40:10304
.META.,,1
column=info:serverstartcode, timestamp=1341183448693,
value=1341183444689
.META.,,1 column=info:v,
timestamp=1331755419291, value=\x00\x00
.META.,,1259448304806 column=info:server,
timestamp=1341124914705, value=sfor3s24:10304
.META.,,1259448304806
column=info:serverstartcode, timestamp=1341124914705,
value=1341124455863
{noformat}
Except that the devil is in the details, ".META.,,1" is not
".META.,,1259448304806". Basically something writes to .META. by directly
creating the row key without caring if the row is in the old format. I did a
deleteall in the shell and it fixed the issue... until some time later it was
stuck again because the edits reappeared (still not sure why). This time the
PostOpenDeployTasksThread were stuck in the RS trying to update .META. but
there was no logging (saw it with a jstack). I deleted the row again to make it
work.
I'm marking this as a blocker against 0.94.2 since we're trying to get 0.94.1
out, but I wouldn't recommend upgrading to 0.94 if your cluster was created
before 0.89
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira