1. grid04 RS is down because of this:
2012-01-04 19:06:56,949 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:08:05,324 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:08:13,331 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:08:21,337 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:09:04,781 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:09:12,788 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:09:20,794 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:09:28,799 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:11:28,193 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache:
LRU Stats: total=9.82 MB, free=1.16 GB, max=1.17 GB, blocks=0, accesses=0,
hits=0, hitRatio=?%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=?%,
evictions=0, evicted=0, evictedPerRun=NaN
2012-01-04 19:14:13,473 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:14:21,480 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:14:29,487 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:14:37,493 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
2012-01-04 19:14:41,463 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster
shutdown set and not carrying any regions
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server
on 60020
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 2 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 3 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 4 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 6 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 5 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server Responder
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 9 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 1 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 0 on 60020: exiting
2012-01-04 19:14:41,464 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 8 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 7 on 60020: exiting
2. ZK /hbase/root-region-server is empty
3. Master log have a lot of these errors:
2012-01-04 19:14:13,453 DEBUG org.apache.hadoop.hbase.client.MetaScanner:
Scanning .META. starting at row= for max=2147483647 rows
2012-01-04 19:14:13,455 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@752a2259;
hsa=us01-ciqps1-grid04.carrieriq.com:60020
2012-01-04 19:14:13,471 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
locateRegionInMeta parentTable=-ROOT-, metaLocation=address:
us01-ciqps1-grid04.carrieriq.com:60020, regioninfo: -ROOT-,,0.70236052,
attempt=0 of 20 failed; retrying after sleep of 8000 because:
org.apache.hadoop.hbase.NotServingRegionException: Region is not online:
-ROOT-,,0
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2359)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1645)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
2012-01-04 19:14:13,472 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Lookedup root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@752a2259;
hsa=us01-ciqps1-grid04.carrieriq.com:60020
2012-01-04 19:14:13,632 DEBUG org.apache.hadoop.hbase.util.FSUtils: Failed
fs.recoverLease invocation, java.lang.NoSuchMethodException:
org.apache.hadoop.hdfs.DistributedFileSystem.recoverLease(org.apache.hadoop.fs.Path),
trying fs.append instead
2012-01-04 19:14:13,634 WARN org.apache.hadoop.hbase.util.FSUtils: Waited
763847ms for lease recovery on
hdfs://us01-ciqps1-name01.carrieriq.com:9000/hbase/.logs/us01-ciqps1-grid07.carrieriq.com,60020,1325640817081/us01-ciqps1-grid07.carrieriq.com%3A60020.1325640817575:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file
/hbase/.logs/us01-ciqps1-grid07.carrieriq.com,60020,1325640817081/us01-ciqps1-grid07.carrieriq.com%3A60020.1325640817575
for DFSClient_hb_m_us01-ciqps1-name01.carrieriq.com:60000_1325703680503 on
client 10.202.50.100, because this file is already being created by NN_Recovery
on 10.202.50.100
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1093)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1181)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:422)
>From the log files in grid04 and in Master I can not find who is trying to
>assign -ROOT-. It looks like Master thinks its grid04
and grid04 thinks it some other node, that is why grid04 waits for -ROOT-
becomes online (what is not going to happen) and then shut itself down. Master
in turn waits
forever for grid04 become available (but its down already).
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [email protected]
________________________________________
From: [email protected] [[email protected]] On Behalf Of Stack
[[email protected]]
Sent: Wednesday, January 04, 2012 1:09 PM
To: [email protected]
Subject: Re: -ROOT- is offline
On Wed, Jan 4, 2012 at 12:08 PM, Vladimir Rodionov
<[email protected]> wrote:
It thinks -ROOT- is here: us01-ciqps1-grid04.carrieriq.com:60020
Is that server up? Does it have root? When your cluster starts, does
master log show it trying to assign the -ROOT- or is it having same
issue?
Is your zk up? If so, when you look at the /hbase/root-region-server,
what does it have in it? Is it the above server?
To look, you can bring up zk cli doing something like below:
hbase zkcli -server 3.zookeeper:2181
You could try removing the root-region-server znode and try a restart.
St.Ack
Confidentiality Notice: The information contained in this message, including
any attachments hereto, may be confidential and is intended to be read only by
the individual or entity to whom this message is addressed. If the reader of
this message is not the intended recipient or an agent or designee of the
intended recipient, please note that any review, use, disclosure or
distribution of this message or its attachments, in any form, is strictly
prohibited. If you have received this message in error, please immediately
notify the sender and/or [email protected] and delete or destroy any
copy of this message and its attachments.