Hey guys,
Ran into some issues while testing and wanted to understand what has happened
better. Got the following exception when I went to the web UI
Trying to contact region server 10.129.68.204:60020 for region .META.,,1, row
'', but failed after 3 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2254)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1837)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>From a program that reads from a HBase table:
java.lang.reflect.UndeclaredThrowableException
at $Proxy1.getRegionInfo(Unknown Source)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:985)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:675)
<snip>
Followed up on the hmaster's log:
2010-01-28 11:21:16,148 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 1 row(s) of meta region {server:
10.129.68.204:60020, regionname: .META.,,1, startKey: <>} complete
2010-01-28 11:21:16,148 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1
.META. region(s) scanned
2010-01-28 11:21:34,539 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Received report from unknown server -- telling it to MSG_CALL_SERVER_STARTUP:
10.129.68.203,60020,1263605543210
2010-01-28 11:21:35,622 INFO org.apache.hadoop.hbase.master.ServerManager:
Received start message from: hbasetest004.ash1.facebook.com,60020,1264706494600
2010-01-28 11:21:36,649 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Updated ZNode
/hbase/rs/1264706494600 with data 10.129.68.203:60020
2010-01-28 11:21:40,704 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 39 on 60000, call createTable({NAME => 'test1', FAMILIES => [{NAME =>
'cf1', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE
=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}) from
10.131.29.183:63308: error: org.apache.hadoop.hbase.TableExistsException: test1
org.apache.hadoop.hbase.TableExistsException: test1
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:792)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:756)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>From a hregionserver's logs:
2010-01-28 11:20:22,589 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Cache Stats: Sizes: Total=19.661453MB (20616528), Free=2377.0137MB
(2492479408), Max=2396.675MB (2513095936), Counts: Blocks=0, Access=0, Hit=0,
Miss=0, Evictions=0, Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%,
Evicted/Run=NaN
2010-01-28 11:21:22,588 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Cache Stats: Sizes: Total=19.661453MB (20616528), Free=2377.0137MB
(2492479408), Max=2396.675MB (2513095936), Counts: Blocks=0, Access=0, Hit=0,
Miss=0, Evictions=0, Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%,
Evicted/Run=NaN
2010-01-28 11:22:18,794 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_CALL_SERVER_STARTUP
The code says the following:
case MSG_CALL_SERVER_STARTUP:
// We the MSG_CALL_SERVER_STARTUP on startup but we can also
// get it when the master is panicking because for instance
// the HDFS has been yanked out from under it. Be wary of
// this message.
Any ideas on what is going on? The best I can come up with is perhaps a flaky
DNS - would that explain this? This happened on three of our test clusters at
almost the same time. Also, what is the most graceful/simplest way to recover
from this?
Thanks
Karthik