1. grid04 RS is down because of this:

2012-01-04 19:06:56,949 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:08:05,324 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:08:13,331 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:08:21,337 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:09:04,781 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:09:12,788 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:09:20,794 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:09:28,799 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:11:28,193 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
LRU Stats: total=9.82 MB, free=1.16 GB, max=1.17 GB, blocks=0, accesses=0, 
hits=0, hitRatio=?%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=?%, 
evictions=0, evicted=0, evictedPerRun=NaN
2012-01-04 19:14:13,473 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:14:21,480 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:14:29,487 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:14:37,493 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; 
Region is not online: -ROOT-,,0
2012-01-04 19:14:41,463 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exiting; cluster 
shutdown set and not carrying any regions
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server 
on 60020
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 4 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server 
handler 2 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server 
handler 3 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server 
handler 4 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server 
handler 6 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server 
handler 5 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC 
Server Responder
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server 
handler 9 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server 
handler 1 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 0 on 60020: exiting
2012-01-04 19:14:41,463 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server 
handler 0 on 60020: exiting
2012-01-04 19:14:41,464 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 3 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 2 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 1 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server 
handler 8 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 6 on 60020: exiting
2012-01-04 19:14:41,464 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server 
handler 7 on 60020: exiting

2. ZK /hbase/root-region-server is empty


3. Master log have a lot of these errors:

2012-01-04 19:14:13,453 DEBUG org.apache.hadoop.hbase.client.MetaScanner: 
Scanning .META. starting at row= for max=2147483647 rows
2012-01-04 19:14:13,455 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
Lookedup root region location, 
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@752a2259;
 hsa=us01-ciqps1-grid04.carrieriq.com:60020
2012-01-04 19:14:13,471 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
locateRegionInMeta parentTable=-ROOT-, metaLocation=address: 
us01-ciqps1-grid04.carrieriq.com:60020, regioninfo: -ROOT-,,0.70236052, 
attempt=0 of 20 failed; retrying after sleep of 8000 because: 
org.apache.hadoop.hbase.NotServingRegionException: Region is not online: 
-ROOT-,,0
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2359)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1645)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

2012-01-04 19:14:13,472 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
Lookedup root region location, 
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@752a2259;
 hsa=us01-ciqps1-grid04.carrieriq.com:60020
2012-01-04 19:14:13,632 DEBUG org.apache.hadoop.hbase.util.FSUtils: Failed 
fs.recoverLease invocation, java.lang.NoSuchMethodException: 
org.apache.hadoop.hdfs.DistributedFileSystem.recoverLease(org.apache.hadoop.fs.Path),
 trying fs.append instead
2012-01-04 19:14:13,634 WARN org.apache.hadoop.hbase.util.FSUtils: Waited 
763847ms for lease recovery on 
hdfs://us01-ciqps1-name01.carrieriq.com:9000/hbase/.logs/us01-ciqps1-grid07.carrieriq.com,60020,1325640817081/us01-ciqps1-grid07.carrieriq.com%3A60020.1325640817575:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
 failed to create file 
/hbase/.logs/us01-ciqps1-grid07.carrieriq.com,60020,1325640817081/us01-ciqps1-grid07.carrieriq.com%3A60020.1325640817575
 for DFSClient_hb_m_us01-ciqps1-name01.carrieriq.com:60000_1325703680503 on 
client 10.202.50.100, because this file is already being created by NN_Recovery 
on 10.202.50.100
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1093)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1181)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:422)


>From the log files in grid04 and in Master I can not find who is trying to 
>assign -ROOT-. It looks like Master thinks its grid04
and grid04 thinks it some other node, that is why grid04 waits for -ROOT- 
becomes online (what is not going to happen) and then shut itself down. Master 
in turn waits
forever for grid04 become available (but its down already).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [email protected]

________________________________________
From: [email protected] [[email protected]] On Behalf Of Stack 
[[email protected]]
Sent: Wednesday, January 04, 2012 1:09 PM
To: [email protected]
Subject: Re: -ROOT- is offline

On Wed, Jan 4, 2012 at 12:08 PM, Vladimir Rodionov
<[email protected]> wrote:

It thinks -ROOT- is here: us01-ciqps1-grid04.carrieriq.com:60020

Is that server up?  Does it have root?  When your cluster starts, does
master log show it trying to assign the -ROOT- or is it having same
issue?

Is your zk up?  If so, when you look at the /hbase/root-region-server,
what does it have in it?  Is it the above server?

To look, you can bring up zk cli doing something like below:

hbase zkcli -server 3.zookeeper:2181

You could try removing the root-region-server znode and try a restart.

St.Ack

Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or [email protected] and delete or destroy any 
copy of this message and its attachments.

Reply via email to