[
https://issues.apache.org/jira/browse/HBASE-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423601#comment-13423601
]
Elliott Clark commented on HBASE-6461:
--------------------------------------
Small update:
I tried it again(well 30 times actually) with more logs enabled. and I noticed
this in the NameNode
{noformat}
eclark@sv4r11s38:/export1/eclark$ grep "recovery started"
/export1/eclark/logs/hadoop-eclark-namenode-sv4r11s38.log
2012-07-27 00:39:45,094 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
blk_3380368109770176913_1021 recovery started, primary=10.4.6.38:9908
{noformat}
where the primary listed is the server that was just killed. and the block id
is the id for the RegionServer's hlog. According to the comments around the
log message the primary is supposed to be an alive data node. I'm wondering if
this is an hdfs bug. Thoughts ?
> Killing the HRegionServer and DataNode hosting ROOT can result in a malformed
> root table.
> -----------------------------------------------------------------------------------------
>
> Key: HBASE-6461
> URL: https://issues.apache.org/jira/browse/HBASE-6461
> Project: HBase
> Issue Type: Bug
> Environment: hadoop-0.20.2-cdh3u3
> HBase 0.94.1 RC1
> Reporter: Elliott Clark
> Priority: Critical
> Fix For: 0.94.2
>
>
> Spun up a new dfs on hadoop-0.20.2-cdh3u3
> Started hbase
> started running loadtest tool.
> killed rs and dn holding root with killall -9 java on server sv4r27s44 at
> about 2012-07-25 22:40:00
> After things stabilize Root is in a bad state. Ran hbck and got:
> Exception in thread "main"
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server address
> listed in -ROOT- for region .META.,,1.1028785192 containing row
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1016)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:841)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:810)
> at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:232)
> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:172)
> at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:241)
> at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3236)
> hbase(main):001:0> scan '-ROOT-'
> ROW COLUMN+CELL
>
>
> 12/07/25 22:43:18 INFO security.UserGroupInformation: JAAS Configuration
> already set up for Hadoop, not re-installing.
> .META.,,1 column=info:regioninfo,
> timestamp=1343255838525, value={NAME => '.META.,,1', STARTKEY => '', ENDKEY
> => '', ENCODED => 1028785192,}
> .META.,,1 column=info:v,
> timestamp=1343255838525, value=\x00\x00
>
> 1 row(s) in 0.5930 seconds
> Here's the master log: https://gist.github.com/3179194
> I tried the same thing with 0.92.1 and I was able to get into a similar
> situation, so I don't think this is anything new.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira