Thank you! We will have a try.

2009/8/25 Jean-Daniel Cryans <[email protected]>

> No, Zookeeper will help the master election so you must start other
> masters yourself. See
> http://wiki.apache.org/hadoop/Hbase/MultipleMasters
>
> To improve that you can add more servers to hbase.zookeeper.quorum,
> change the zookeeper.session.timeout to something higher than 1 minute
> (current default) and make sure that the servers hosting ZK aren't CPU
> and mem starved (typical case is having only 2 CPUs for
> datanode/region server/zookeeper plus a MR job running).
>
> J-D
>
> On Tue, Aug 25, 2009 at 2:30 AM, Zheng Lv<[email protected]>
> wrote:
> > Hello,
> >    Thanks, J-D.
> >    We did the same test 3 days before, and got the same result: the
> master
> > killed itself after running for 2 days. Now we have 2 questions.
> >    1 Is it normal that the master killed itself so quickly? And if not,
> > what can we do to improve it?
> >    2 "Starting a Master on any node should be ok to recover, HBase is
> built
> > for that."
> >       Did you mean a master should be started automatically or we should
> > start a master by ourselves? By the way, what does ZK do? We thought ZK
> is
> > responsable for re-start a master when the old one is dead. Is it?
> >
> >    Thank you,
> >    LvZheng.
> >
> > 2009/8/16 Zheng Lv <[email protected]>
> >
> >> Hello,
> >>     Thank you for your suggestions.
> >>     Several days before We found our routing talbe has some problems,
> after
> >> adjusting now we are sure that the bandwidth is ok.
> >>     And we have used lzo compression.
> >>     So we started the test program again, but after running normally for
> 23
> >> hours, the master killed itself. Following is part of the log.
> >>     By the way, this time we inserted 10 webpages per second only.
> >> 2009-08-14 13:36:31,840 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> >> 4
> >> region servers, 0 dead, average load 48.75
> >> 2009-08-14 13:36:32,016 INFO org.apache.hadoop.hbase.master.BaseScanner:
> >> RegionManager.metaScanner scanning meta region {server:
> 192.168.33.5:60020
> >> ,
> >> regionnam
> >> e: .META.,,1, startKey: <>}
> >> 2009-08-14 13:36:32,076 INFO org.apache.hadoop.hbase.master.BaseScanner:
> >> RegionManager.rootScanner scanning meta region {server:
> 192.168.33.6:60020
> >> ,
> >> regionnam
> >> e: -ROOT-,,0, startKey: <>}
> >> 2009-08-14 13:36:32,084 INFO org.apache.hadoop.hbase.master.BaseScanner:
> >> RegionManager.rootScanner scan of 1 row(s) of meta region {server:
> >> 192.168.33.6:60020
> >> , regionname: -ROOT-,,0, startKey: <>} complete
> >> 2009-08-14 13:36:32,316 INFO org.apache.hadoop.hbase.master.BaseScanner:
> >> RegionManager.metaScanner scan of 193 row(s) of meta region {server:
> >> 192.168.33.5:600
> >> 20, regionname: .META.,,1, startKey: <>} complete
> >> 2009-08-14 13:36:32,316 INFO org.apache.hadoop.hbase.master.BaseScanner:
> >> All
> >> 1 .META. region(s) scanned
> >> 2009-08-14 13:37:00,366 WARN org.apache.zookeeper.ClientCnxn: Exception
> >> closing session 0x22313002be80001 to
> sun.nio.ch.selectionkeyi...@4a407c9f
> >> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
> >> lim=4 cap=4]
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
> >> 2009-08-14 13:37:00,881 INFO org.apache.zookeeper.ClientCnxn: Attempting
> >> connection to server ubuntu3/192.168.33.8:2222
> >> 2009-08-14 13:37:04,366 WARN org.apache.zookeeper.ClientCnxn: Exception
> >> closing session 0x22313002be80000 to
> sun.nio.ch.selectionkeyi...@4ac6ee33
> >> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
> >> lim=4 cap=4]
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
> >> 2009-08-14 13:37:04,721 INFO org.apache.zookeeper.ClientCnxn: Attempting
> >> connection to server ubuntu2/192.168.33.9:2222
> >> 2009-08-14 13:37:08,872 WARN org.apache.zookeeper.ClientCnxn: Exception
> >> closing session 0x22313002be80001 to
> sun.nio.ch.selectionkeyi...@2e93ebe0
> >> java.io.IOException: TIMED OUT
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
> >> 2009-08-14 13:37:08,873 WARN org.apache.zookeeper.ClientCnxn: Ignoring
> >> exception during shutdown output
> >> java.net.SocketException: Transport endpoint is not connected
> >>         at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
> >>         at
> >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
> >>         at
> sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
> >> 2009-08-14 13:37:09,486 INFO org.apache.zookeeper.ClientCnxn: Attempting
> >> connection to server ubuntu2/192.168.33.9:2222
> >> 2009-08-14 13:37:12,712 WARN org.apache.zookeeper.ClientCnxn: Exception
> >> closing session 0x22313002be80000 to
> sun.nio.ch.selectionkeyi...@7162d703
> >> java.io.IOException: TIMED OUT
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
> >> 2009-08-14 13:37:12,713 WARN org.apache.zookeeper.ClientCnxn: Ignoring
> >> exception during shutdown output
> >> java.net.SocketException: Transport endpoint is not connected
> >>         at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
> >>         at
> >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
> >>         at
> sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
> >> 2009-08-14 13:37:13,032 INFO org.apache.zookeeper.ClientCnxn: Attempting
> >> connection to server ubuntu3/192.168.33.8:2222
> >> 2009-08-14 13:37:17,482 WARN org.apache.zookeeper.ClientCnxn: Exception
> >> closing session 0x22313002be80001 to
> sun.nio.ch.selectionkeyi...@1012401d
> >> java.io.IOException: TIMED OUT
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
> >> 2009-08-14 13:37:17,483 WARN org.apache.zookeeper.ClientCnxn: Ignoring
> >> exception during shutdown output
> >> java.net.SocketException: Transport endpoint is not connected
> >>         at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
> >>         at
> >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
> >>         at
> sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
> >> 2009-08-14 13:37:17,856 INFO org.apache.zookeeper.ClientCnxn: Attempting
> >> connection to server ubuntu7/192.168.33.6:2222
> >> 2009-08-14 13:37:19,445 INFO org.apache.zookeeper.ClientCnxn: Priming
> >> connection to java.nio.channels.SocketChannel[connected local=/
> >> 192.168.33.7:40923 remote
> >> =ubuntu7/192.168.33.6:2222]
> >> 2009-08-14 13:37:19,445 INFO org.apache.zookeeper.ClientCnxn: Server
> >> connection successful
> >> 2009-08-14 13:37:21,022 WARN org.apache.zookeeper.ClientCnxn: Exception
> >> closing session 0x22313002be80000 to
> sun.nio.ch.selectionkeyi...@2e101b3a
> >> java.io.IOException: TIMED OUT
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
> >> 2009-08-14 13:37:21,023 WARN org.apache.zookeeper.ClientCnxn: Ignoring
> >> exception during shutdown output
> >> java.net.SocketException: Transport endpoint is not connected
> >>         at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
> >>         at
> >> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
> >>         at
> sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
> >> 2009-08-14 13:37:21,908 INFO org.apache.zookeeper.ClientCnxn: Attempting
> >> connection to server ubuntu7/192.168.33.6:2222
> >> 2009-08-14 13:37:21,908 INFO org.apache.zookeeper.ClientCnxn: Priming
> >> connection to java.nio.channels.SocketChannel[connected local=/
> >> 192.168.33.7:40926 remote
> >> =ubuntu7/192.168.33.6:2222]
> >> 2009-08-14 13:37:21,909 INFO org.apache.zookeeper.ClientCnxn: Server
> >> connection successful
> >> 2009-08-14 13:37:21,911 WARN org.apache.zookeeper.ClientCnxn: Exception
> >> closing session 0x22313002be80000 to
> sun.nio.ch.selectionkeyi...@6bdfe124
> >> java.io.IOException: Session Expired
> >>         at
> >>
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:548)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:661)
> >>         at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
> >> 2009-08-14 13:37:21,912 ERROR org.apache.hadoop.hbase.master.HMaster:
> >> Master
> >> lost its znode, killing itself now
> >> Regards,
> >> LvZheng
> >>
> >
>

Reply via email to