No, Zookeeper will help the master election so you must start other masters yourself. See http://wiki.apache.org/hadoop/Hbase/MultipleMasters
To improve that you can add more servers to hbase.zookeeper.quorum, change the zookeeper.session.timeout to something higher than 1 minute (current default) and make sure that the servers hosting ZK aren't CPU and mem starved (typical case is having only 2 CPUs for datanode/region server/zookeeper plus a MR job running). J-D On Tue, Aug 25, 2009 at 2:30 AM, Zheng Lv<[email protected]> wrote: > Hello, > Thanks, J-D. > We did the same test 3 days before, and got the same result: the master > killed itself after running for 2 days. Now we have 2 questions. > 1 Is it normal that the master killed itself so quickly? And if not, > what can we do to improve it? > 2 "Starting a Master on any node should be ok to recover, HBase is built > for that." > Did you mean a master should be started automatically or we should > start a master by ourselves? By the way, what does ZK do? We thought ZK is > responsable for re-start a master when the old one is dead. Is it? > > Thank you, > LvZheng. > > 2009/8/16 Zheng Lv <[email protected]> > >> Hello, >> Thank you for your suggestions. >> Several days before We found our routing talbe has some problems, after >> adjusting now we are sure that the bandwidth is ok. >> And we have used lzo compression. >> So we started the test program again, but after running normally for 23 >> hours, the master killed itself. Following is part of the log. >> By the way, this time we inserted 10 webpages per second only. >> 2009-08-14 13:36:31,840 INFO org.apache.hadoop.hbase.master.ServerManager: >> 4 >> region servers, 0 dead, average load 48.75 >> 2009-08-14 13:36:32,016 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.metaScanner scanning meta region {server: 192.168.33.5:60020 >> , >> regionnam >> e: .META.,,1, startKey: <>} >> 2009-08-14 13:36:32,076 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.rootScanner scanning meta region {server: 192.168.33.6:60020 >> , >> regionnam >> e: -ROOT-,,0, startKey: <>} >> 2009-08-14 13:36:32,084 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.rootScanner scan of 1 row(s) of meta region {server: >> 192.168.33.6:60020 >> , regionname: -ROOT-,,0, startKey: <>} complete >> 2009-08-14 13:36:32,316 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.metaScanner scan of 193 row(s) of meta region {server: >> 192.168.33.5:600 >> 20, regionname: .META.,,1, startKey: <>} complete >> 2009-08-14 13:36:32,316 INFO org.apache.hadoop.hbase.master.BaseScanner: >> All >> 1 .META. region(s) scanned >> 2009-08-14 13:37:00,366 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x22313002be80001 to sun.nio.ch.selectionkeyi...@4a407c9f >> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 >> lim=4 cap=4] >> at >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897) >> 2009-08-14 13:37:00,881 INFO org.apache.zookeeper.ClientCnxn: Attempting >> connection to server ubuntu3/192.168.33.8:2222 >> 2009-08-14 13:37:04,366 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x22313002be80000 to sun.nio.ch.selectionkeyi...@4ac6ee33 >> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 >> lim=4 cap=4] >> at >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897) >> 2009-08-14 13:37:04,721 INFO org.apache.zookeeper.ClientCnxn: Attempting >> connection to server ubuntu2/192.168.33.9:2222 >> 2009-08-14 13:37:08,872 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x22313002be80001 to sun.nio.ch.selectionkeyi...@2e93ebe0 >> java.io.IOException: TIMED OUT >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858) >> 2009-08-14 13:37:08,873 WARN org.apache.zookeeper.ClientCnxn: Ignoring >> exception during shutdown output >> java.net.SocketException: Transport endpoint is not connected >> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) >> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) >> 2009-08-14 13:37:09,486 INFO org.apache.zookeeper.ClientCnxn: Attempting >> connection to server ubuntu2/192.168.33.9:2222 >> 2009-08-14 13:37:12,712 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x22313002be80000 to sun.nio.ch.selectionkeyi...@7162d703 >> java.io.IOException: TIMED OUT >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858) >> 2009-08-14 13:37:12,713 WARN org.apache.zookeeper.ClientCnxn: Ignoring >> exception during shutdown output >> java.net.SocketException: Transport endpoint is not connected >> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) >> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) >> 2009-08-14 13:37:13,032 INFO org.apache.zookeeper.ClientCnxn: Attempting >> connection to server ubuntu3/192.168.33.8:2222 >> 2009-08-14 13:37:17,482 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x22313002be80001 to sun.nio.ch.selectionkeyi...@1012401d >> java.io.IOException: TIMED OUT >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858) >> 2009-08-14 13:37:17,483 WARN org.apache.zookeeper.ClientCnxn: Ignoring >> exception during shutdown output >> java.net.SocketException: Transport endpoint is not connected >> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) >> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) >> 2009-08-14 13:37:17,856 INFO org.apache.zookeeper.ClientCnxn: Attempting >> connection to server ubuntu7/192.168.33.6:2222 >> 2009-08-14 13:37:19,445 INFO org.apache.zookeeper.ClientCnxn: Priming >> connection to java.nio.channels.SocketChannel[connected local=/ >> 192.168.33.7:40923 remote >> =ubuntu7/192.168.33.6:2222] >> 2009-08-14 13:37:19,445 INFO org.apache.zookeeper.ClientCnxn: Server >> connection successful >> 2009-08-14 13:37:21,022 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x22313002be80000 to sun.nio.ch.selectionkeyi...@2e101b3a >> java.io.IOException: TIMED OUT >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858) >> 2009-08-14 13:37:21,023 WARN org.apache.zookeeper.ClientCnxn: Ignoring >> exception during shutdown output >> java.net.SocketException: Transport endpoint is not connected >> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) >> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) >> 2009-08-14 13:37:21,908 INFO org.apache.zookeeper.ClientCnxn: Attempting >> connection to server ubuntu7/192.168.33.6:2222 >> 2009-08-14 13:37:21,908 INFO org.apache.zookeeper.ClientCnxn: Priming >> connection to java.nio.channels.SocketChannel[connected local=/ >> 192.168.33.7:40926 remote >> =ubuntu7/192.168.33.6:2222] >> 2009-08-14 13:37:21,909 INFO org.apache.zookeeper.ClientCnxn: Server >> connection successful >> 2009-08-14 13:37:21,911 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x22313002be80000 to sun.nio.ch.selectionkeyi...@6bdfe124 >> java.io.IOException: Session Expired >> at >> >> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:548) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:661) >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897) >> 2009-08-14 13:37:21,912 ERROR org.apache.hadoop.hbase.master.HMaster: >> Master >> lost its znode, killing itself now >> Regards, >> LvZheng >> >
