Thank you! We will have a try. 2009/8/25 Jean-Daniel Cryans <[email protected]>
> No, Zookeeper will help the master election so you must start other > masters yourself. See > http://wiki.apache.org/hadoop/Hbase/MultipleMasters > > To improve that you can add more servers to hbase.zookeeper.quorum, > change the zookeeper.session.timeout to something higher than 1 minute > (current default) and make sure that the servers hosting ZK aren't CPU > and mem starved (typical case is having only 2 CPUs for > datanode/region server/zookeeper plus a MR job running). > > J-D > > On Tue, Aug 25, 2009 at 2:30 AM, Zheng Lv<[email protected]> > wrote: > > Hello, > > Thanks, J-D. > > We did the same test 3 days before, and got the same result: the > master > > killed itself after running for 2 days. Now we have 2 questions. > > 1 Is it normal that the master killed itself so quickly? And if not, > > what can we do to improve it? > > 2 "Starting a Master on any node should be ok to recover, HBase is > built > > for that." > > Did you mean a master should be started automatically or we should > > start a master by ourselves? By the way, what does ZK do? We thought ZK > is > > responsable for re-start a master when the old one is dead. Is it? > > > > Thank you, > > LvZheng. > > > > 2009/8/16 Zheng Lv <[email protected]> > > > >> Hello, > >> Thank you for your suggestions. > >> Several days before We found our routing talbe has some problems, > after > >> adjusting now we are sure that the bandwidth is ok. > >> And we have used lzo compression. > >> So we started the test program again, but after running normally for > 23 > >> hours, the master killed itself. Following is part of the log. > >> By the way, this time we inserted 10 webpages per second only. > >> 2009-08-14 13:36:31,840 INFO > org.apache.hadoop.hbase.master.ServerManager: > >> 4 > >> region servers, 0 dead, average load 48.75 > >> 2009-08-14 13:36:32,016 INFO org.apache.hadoop.hbase.master.BaseScanner: > >> RegionManager.metaScanner scanning meta region {server: > 192.168.33.5:60020 > >> , > >> regionnam > >> e: .META.,,1, startKey: <>} > >> 2009-08-14 13:36:32,076 INFO org.apache.hadoop.hbase.master.BaseScanner: > >> RegionManager.rootScanner scanning meta region {server: > 192.168.33.6:60020 > >> , > >> regionnam > >> e: -ROOT-,,0, startKey: <>} > >> 2009-08-14 13:36:32,084 INFO org.apache.hadoop.hbase.master.BaseScanner: > >> RegionManager.rootScanner scan of 1 row(s) of meta region {server: > >> 192.168.33.6:60020 > >> , regionname: -ROOT-,,0, startKey: <>} complete > >> 2009-08-14 13:36:32,316 INFO org.apache.hadoop.hbase.master.BaseScanner: > >> RegionManager.metaScanner scan of 193 row(s) of meta region {server: > >> 192.168.33.5:600 > >> 20, regionname: .META.,,1, startKey: <>} complete > >> 2009-08-14 13:36:32,316 INFO org.apache.hadoop.hbase.master.BaseScanner: > >> All > >> 1 .META. region(s) scanned > >> 2009-08-14 13:37:00,366 WARN org.apache.zookeeper.ClientCnxn: Exception > >> closing session 0x22313002be80001 to > sun.nio.ch.selectionkeyi...@4a407c9f > >> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 > >> lim=4 cap=4] > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897) > >> 2009-08-14 13:37:00,881 INFO org.apache.zookeeper.ClientCnxn: Attempting > >> connection to server ubuntu3/192.168.33.8:2222 > >> 2009-08-14 13:37:04,366 WARN org.apache.zookeeper.ClientCnxn: Exception > >> closing session 0x22313002be80000 to > sun.nio.ch.selectionkeyi...@4ac6ee33 > >> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 > >> lim=4 cap=4] > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897) > >> 2009-08-14 13:37:04,721 INFO org.apache.zookeeper.ClientCnxn: Attempting > >> connection to server ubuntu2/192.168.33.9:2222 > >> 2009-08-14 13:37:08,872 WARN org.apache.zookeeper.ClientCnxn: Exception > >> closing session 0x22313002be80001 to > sun.nio.ch.selectionkeyi...@2e93ebe0 > >> java.io.IOException: TIMED OUT > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858) > >> 2009-08-14 13:37:08,873 WARN org.apache.zookeeper.ClientCnxn: Ignoring > >> exception during shutdown output > >> java.net.SocketException: Transport endpoint is not connected > >> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) > >> at > >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) > >> at > sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) > >> 2009-08-14 13:37:09,486 INFO org.apache.zookeeper.ClientCnxn: Attempting > >> connection to server ubuntu2/192.168.33.9:2222 > >> 2009-08-14 13:37:12,712 WARN org.apache.zookeeper.ClientCnxn: Exception > >> closing session 0x22313002be80000 to > sun.nio.ch.selectionkeyi...@7162d703 > >> java.io.IOException: TIMED OUT > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858) > >> 2009-08-14 13:37:12,713 WARN org.apache.zookeeper.ClientCnxn: Ignoring > >> exception during shutdown output > >> java.net.SocketException: Transport endpoint is not connected > >> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) > >> at > >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) > >> at > sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) > >> 2009-08-14 13:37:13,032 INFO org.apache.zookeeper.ClientCnxn: Attempting > >> connection to server ubuntu3/192.168.33.8:2222 > >> 2009-08-14 13:37:17,482 WARN org.apache.zookeeper.ClientCnxn: Exception > >> closing session 0x22313002be80001 to > sun.nio.ch.selectionkeyi...@1012401d > >> java.io.IOException: TIMED OUT > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858) > >> 2009-08-14 13:37:17,483 WARN org.apache.zookeeper.ClientCnxn: Ignoring > >> exception during shutdown output > >> java.net.SocketException: Transport endpoint is not connected > >> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) > >> at > >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) > >> at > sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) > >> 2009-08-14 13:37:17,856 INFO org.apache.zookeeper.ClientCnxn: Attempting > >> connection to server ubuntu7/192.168.33.6:2222 > >> 2009-08-14 13:37:19,445 INFO org.apache.zookeeper.ClientCnxn: Priming > >> connection to java.nio.channels.SocketChannel[connected local=/ > >> 192.168.33.7:40923 remote > >> =ubuntu7/192.168.33.6:2222] > >> 2009-08-14 13:37:19,445 INFO org.apache.zookeeper.ClientCnxn: Server > >> connection successful > >> 2009-08-14 13:37:21,022 WARN org.apache.zookeeper.ClientCnxn: Exception > >> closing session 0x22313002be80000 to > sun.nio.ch.selectionkeyi...@2e101b3a > >> java.io.IOException: TIMED OUT > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858) > >> 2009-08-14 13:37:21,023 WARN org.apache.zookeeper.ClientCnxn: Ignoring > >> exception during shutdown output > >> java.net.SocketException: Transport endpoint is not connected > >> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) > >> at > >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) > >> at > sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) > >> 2009-08-14 13:37:21,908 INFO org.apache.zookeeper.ClientCnxn: Attempting > >> connection to server ubuntu7/192.168.33.6:2222 > >> 2009-08-14 13:37:21,908 INFO org.apache.zookeeper.ClientCnxn: Priming > >> connection to java.nio.channels.SocketChannel[connected local=/ > >> 192.168.33.7:40926 remote > >> =ubuntu7/192.168.33.6:2222] > >> 2009-08-14 13:37:21,909 INFO org.apache.zookeeper.ClientCnxn: Server > >> connection successful > >> 2009-08-14 13:37:21,911 WARN org.apache.zookeeper.ClientCnxn: Exception > >> closing session 0x22313002be80000 to > sun.nio.ch.selectionkeyi...@6bdfe124 > >> java.io.IOException: Session Expired > >> at > >> > >> > org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:548) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:661) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897) > >> 2009-08-14 13:37:21,912 ERROR org.apache.hadoop.hbase.master.HMaster: > >> Master > >> lost its znode, killing itself now > >> Regards, > >> LvZheng > >> > > >
