Looks like your DFS NameNode became unavailable about the same time that ZooKeeper timeouts started happening. Overloading? Anything relevant in the NameNode logs?
- Andy ________________________________ From: Lucas Nazário dos Santos <[email protected]> To: [email protected] Sent: Wed, October 7, 2009 9:43:49 AM Subject: HBase crashed: FATAL HMaster: Shutting down HBase cluster: file system not available Hello, My HBase cluster crashed today after a couple of days running and the logs show the exception bellow (end of the message). Some log excerpts that took my attention are: 2009-10-07 11:25:17,032 ERROR org.apache.hadoop.hbase.master.HMaster: Master lost its znode, killing itself now 2009-10-07 11:25:17,174 FATAL org.apache.hadoop.hbase.master.HMaster: Shutting down HBase cluster: file system not available Any clue on what happened? What could I do to prevent this from occurring in the future? Thanks! Lucas 2009-10-07 11:24:42,823 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 9 row(s) of meta region {server: 192.168.1.3:60020, regionname: .META.,,1, startKey: <>} complete 2009-10-07 11:24:42,823 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned 2009-10-07 11:25:06,311 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x1242b188e8a0001 to sun.nio.ch.selectionkeyi...@148c02f java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858) 2009-10-07 11:25:06,702 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server server2/192.168.1.3:2181 2009-10-07 11:25:06,702 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/ 192.168.1.3:49602 remote=server2/192.168.1.3:2181] 2009-10-07 11:25:06,703 INFO org.apache.zookeeper.ClientCnxn: Server connection successful 2009-10-07 11:25:16,911 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x242b1890c70000 to sun.nio.ch.selectionkeyi...@1060478 java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897) 2009-10-07 11:25:16,911 INFO org.apache.hadoop.hbase.master.ServerManager: server2,60020,1254853514050 znode expired 2009-10-07 11:25:17,021 INFO org.apache.hadoop.hbase.master.RegionManager: META region removed from onlineMetaRegions 2009-10-07 11:25:17,032 ERROR org.apache.hadoop.hbase.master.HMaster: Master lost its znode, killing itself now 2009-10-07 11:25:17,032 INFO org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of server server2,60020,1254853514050: logSplit: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0 2009-10-07 11:25:17,174 FATAL org.apache.hadoop.hbase.master.HMaster: Shutting down HBase cluster: file system not available java.io.IOException: File system is not available at org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125) at org.apache.hadoop.hbase.master.HMaster.checkFileSystem(HMaster.java:324) at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:525) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:426) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:197) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:585) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643) at org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114) ... 3 more 2009-10-07 11:25:17,174 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer
