[ 
https://issues.apache.org/jira/browse/HBASE-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423689#comment-13423689
 ] 

nkeywal commented on HBASE-6461:
--------------------------------

For HBASE-6401, the hdfs jira is HDFS-3701, Uma said he will have a patch next 
week.

You can validate that it's the root cause by changing the log level to debug 
for org.apache.hadoop.hdfs.DFSClient. You should get this log line:
          LOG.debug("DFSClient file " + src
              + " is being concurrently append to" + " but datanode "
              + primaryNode.getHostName() + " probably does not have block "
              + last.getBlock());


There is a patch in HBASE-6435. It lowers a lot the probability to hit this 
bug, so you may want to try it (it's not totally finished however, but some 
reviews will help, so... :-) ). As well it's for trunk but should go easily on 
0.94
                
> Killing the HRegionServer and DataNode hosting ROOT can result in a malformed 
> root table.
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-6461
>                 URL: https://issues.apache.org/jira/browse/HBASE-6461
>             Project: HBase
>          Issue Type: Bug
>         Environment: hadoop-0.20.2-cdh3u3
> HBase 0.94.1 RC1
>            Reporter: Elliott Clark
>            Priority: Critical
>             Fix For: 0.94.2
>
>
> Spun up a new dfs on hadoop-0.20.2-cdh3u3
> Started hbase
> started running loadtest tool.
> killed rs and dn holding root with killall -9 java on server sv4r27s44 at 
> about 2012-07-25 22:40:00
> After things stabilize Root is in a bad state. Ran hbck and got:
> Exception in thread "main" 
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server address 
> listed in -ROOT- for region .META.,,1.1028785192 containing row 
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1016)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:841)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:810)
> at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:232)
> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:172)
> at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:241)
> at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3236)
> hbase(main):001:0> scan '-ROOT-'
> ROW                                           COLUMN+CELL                     
>                                                                               
>                     
> 12/07/25 22:43:18 INFO security.UserGroupInformation: JAAS Configuration 
> already set up for Hadoop, not re-installing.
>  .META.,,1                                    column=info:regioninfo, 
> timestamp=1343255838525, value={NAME => '.META.,,1', STARTKEY => '', ENDKEY 
> => '', ENCODED => 1028785192,}
>  .META.,,1                                    column=info:v, 
> timestamp=1343255838525, value=\x00\x00                                       
>                                      
> 1 row(s) in 0.5930 seconds
> Here's the master log: https://gist.github.com/3179194
> I tried the same thing with 0.92.1 and I was able to get into a similar 
> situation, so I don't think this is anything new. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to