[
https://issues.apache.org/jira/browse/HBASE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410539#comment-13410539
]
Jean-Daniel Cryans commented on HBASE-6310:
-------------------------------------------
Maybe, I can't tell for sure until we find the code that has the issue, but
we've been running 0.92 for >6 months on multiple clusters and never had this
issue whereas this one 0.94 cluster has it.
> -ROOT- corruption when .META. is using the old encoding scheme
> --------------------------------------------------------------
>
> Key: HBASE-6310
> URL: https://issues.apache.org/jira/browse/HBASE-6310
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.94.0
> Reporter: Jean-Daniel Cryans
> Priority: Blocker
> Fix For: 0.96.0, 0.94.2
>
>
> We're still working the on the root cause here, but after the leap second
> armageddon we had a hard time getting our 0.94 cluster back up. This is what
> we saw in the logs until the master died by itself:
> {noformat}
> 2012-07-01 23:01:52,149 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> locateRegionInMeta parentTable=-ROOT-,
> metaLocation={region=-ROOT-,,0.70236052, hostname=sfor3s28,
> port=10304}, attempt=16 of 100 failed; retrying after sleep of 32000
> because: HRegionInfo was null or empty in -ROOT-,
> row=keyvalues={.META.,,1259448304806/info:server/1341124914705/Put/vlen=14/ts=0,
> .META.,,1259448304806/info:serverstartcode/1341124914705/Put/vlen=8/ts=0}
> {noformat}
> (it's strage that we retry this)
> This was really misleading because I could see the regioninfo in a scan:
> {noformat}
> hbase(main):002:0> scan '-ROOT-'
> ROW COLUMN+CELL
> .META.,,1 column=info:regioninfo,
> timestamp=1331755381142, value={NAME => '.META.,,1', STARTKEY => '',
> ENDKEY => '', ENCODED => 1028785192,}
> .META.,,1 column=info:server,
> timestamp=1341183448693, value=sfor3s40:10304
> .META.,,1
> column=info:serverstartcode, timestamp=1341183448693,
> value=1341183444689
> .META.,,1 column=info:v,
> timestamp=1331755419291, value=\x00\x00
> .META.,,1259448304806 column=info:server,
> timestamp=1341124914705, value=sfor3s24:10304
> .META.,,1259448304806
> column=info:serverstartcode, timestamp=1341124914705,
> value=1341124455863
> {noformat}
> Except that the devil is in the details, ".META.,,1" is not
> ".META.,,1259448304806". Basically something writes to .META. by directly
> creating the row key without caring if the row is in the old format. I did a
> deleteall in the shell and it fixed the issue... until some time later it was
> stuck again because the edits reappeared (still not sure why). This time the
> PostOpenDeployTasksThread were stuck in the RS trying to update .META. but
> there was no logging (saw it with a jstack). I deleted the row again to make
> it work.
> I'm marking this as a blocker against 0.94.2 since we're trying to get 0.94.1
> out, but I wouldn't recommend upgrading to 0.94 if your cluster was created
> before 0.89
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira