Hey guys,

Ran into some issues while testing and wanted to understand what has happened 
better. Got the following exception when I went to the web UI

Trying to contact region server 10.129.68.204:60020 for region .META.,,1, row 
'', but failed after 3 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException: 
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2254)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1837)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)


>From a program that reads from a HBase table:
java.lang.reflect.UndeclaredThrowableException
        at $Proxy1.getRegionInfo(Unknown Source)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:985)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:675)
<snip>


Followed  up on the hmaster's log:

2010-01-28 11:21:16,148 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.metaScanner scan of 1 row(s) of meta region {server: 
10.129.68.204:60020, regionname: .META.,,1, startKey: <>} complete
2010-01-28 11:21:16,148 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 
.META. region(s) scanned
2010-01-28 11:21:34,539 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Received report from unknown server -- telling it to MSG_CALL_SERVER_STARTUP: 
10.129.68.203,60020,1263605543210
2010-01-28 11:21:35,622 INFO org.apache.hadoop.hbase.master.ServerManager: 
Received start message from: hbasetest004.ash1.facebook.com,60020,1264706494600
2010-01-28 11:21:36,649 DEBUG 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Updated ZNode 
/hbase/rs/1264706494600 with data 10.129.68.203:60020
2010-01-28 11:21:40,704 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 39 on 60000, call createTable({NAME => 'test1', FAMILIES => [{NAME => 
'cf1', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE 
=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}) from 
10.131.29.183:63308: error: org.apache.hadoop.hbase.TableExistsException: test1
org.apache.hadoop.hbase.TableExistsException: test1
        at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:792)
        at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:756)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

>From a hregionserver's logs:

2010-01-28 11:20:22,589 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
Cache Stats: Sizes: Total=19.661453MB (20616528), Free=2377.0137MB 
(2492479408), Max=2396.675MB (2513095936), Counts: Blocks=0, Access=0, Hit=0, 
Miss=0, Evictions=0, Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, 
Evicted/Run=NaN
2010-01-28 11:21:22,588 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
Cache Stats: Sizes: Total=19.661453MB (20616528), Free=2377.0137MB 
(2492479408), Max=2396.675MB (2513095936), Counts: Blocks=0, Access=0, Hit=0, 
Miss=0, Evictions=0, Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, 
Evicted/Run=NaN
2010-01-28 11:22:18,794 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_CALL_SERVER_STARTUP


The code says the following:
              case MSG_CALL_SERVER_STARTUP:
                // We the MSG_CALL_SERVER_STARTUP on startup but we can also
                // get it when the master is panicking because for instance
                // the HDFS has been yanked out from under it.  Be wary of
                // this message.

Any ideas on what is going on? The best I can come up with is perhaps a flaky 
DNS - would that explain this? This happened on three of our test clusters at 
almost the same time. Also, what is the most graceful/simplest way to recover 
from this?


Thanks
Karthik

Reply via email to