This is the cause: org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server serverName=sjc1-hadoop1.sjc1.carrieriq.com,60020,1294856823378, load=(requests=0, regions=6, usedHeap=514, maxHeap=3983): regionserver:60020-0x12d7b7b1c760004 regionserver:60020-0x12d7b7b1c760004 received expired from ZooKeeper, aborting org.apache.zookeeper.KeeperException$SessionExpiredException:
Why did the session expire? Typically it's GC, what does your GC logs say? Otherwise, network issues perhaps? Swapping? Other machine related systems problems? -ryan On Fri, Jan 14, 2011 at 3:04 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I ran 0.90 RC3 in dev cluster. > > I saw the following in region server log: > > Caused by: org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing sjc1-hadoop1.sjc1.carrieriq.com,60020,1294856823378 as > dead server > at > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:197) > at > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:247) > at > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:648) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1036) > > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:753) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > at $Proxy0.regionServerReport(Unknown Source) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:702) > ... 2 more > 2011-01-13 03:55:08,982 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, > connectString=sjc1-hadoop0.sjc1.carrieriq.com:2181sessionTimeout=90000 > watcher=hconnection > 2011-01-13 03:55:08,914 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > serverName=sjc1-hadoop1.sjc1.carrieriq.com,60020,1294856823378, > load=(requests=0, regions=6, usedHeap=514, maxHeap=3983): > regionserver:60020-0x12d7b7b1c760004 regionserver:60020-0x12d7b7b1c760004 > received expired from ZooKeeper, aborting > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:328) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:246) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > > --------------- > > And the following from master log: > > 2011-01-13 03:52:42,003 INFO > org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer > ephemeral node deleted, processing expiration [ > sjc1-hadoop1.sjc1.carrieriq.com,60020,1294856823378] > 2011-01-13 03:52:42,005 DEBUG org.apache.hadoop.hbase.master.ServerManager: > Added=sjc1-hadoop1.sjc1.carrieriq.com,60020,1294856823378 to dead servers, > submitted shutdown handler to be executed, root=false, meta=false > 2011-01-13 03:52:42,005 INFO > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs > for sjc1-hadoop1.sjc1.carrieriq.com,60020,1294856823378 > 2011-01-13 03:52:42,092 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 1 hlog(s) > in hdfs:// > sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1294856823378 > 2011-01-13 03:52:42,093 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-0,5,main]: starting > 2011-01-13 03:52:42,094 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread > Thread[WriterThread-1,5,main]: starting > 2011-01-13 03:52:42,096 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 1 of > 1: hdfs:// > sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1294856823378/sjc1-hadoop1.sjc1.carrieriq.com%3A60020.1294860449407, > length=0 > > Please advise what could be the cause. > > Thanks >