No, Zookeeper will help the master election so you must start other
masters yourself. See
http://wiki.apache.org/hadoop/Hbase/MultipleMasters

To improve that you can add more servers to hbase.zookeeper.quorum,
change the zookeeper.session.timeout to something higher than 1 minute
(current default) and make sure that the servers hosting ZK aren't CPU
and mem starved (typical case is having only 2 CPUs for
datanode/region server/zookeeper plus a MR job running).

J-D

On Tue, Aug 25, 2009 at 2:30 AM, Zheng Lv<[email protected]> wrote:
> Hello,
>    Thanks, J-D.
>    We did the same test 3 days before, and got the same result: the master
> killed itself after running for 2 days. Now we have 2 questions.
>    1 Is it normal that the master killed itself so quickly? And if not,
> what can we do to improve it?
>    2 "Starting a Master on any node should be ok to recover, HBase is built
> for that."
>       Did you mean a master should be started automatically or we should
> start a master by ourselves? By the way, what does ZK do? We thought ZK is
> responsable for re-start a master when the old one is dead. Is it?
>
>    Thank you,
>    LvZheng.
>
> 2009/8/16 Zheng Lv <[email protected]>
>
>> Hello,
>>     Thank you for your suggestions.
>>     Several days before We found our routing talbe has some problems, after
>> adjusting now we are sure that the bandwidth is ok.
>>     And we have used lzo compression.
>>     So we started the test program again, but after running normally for 23
>> hours, the master killed itself. Following is part of the log.
>>     By the way, this time we inserted 10 webpages per second only.
>> 2009-08-14 13:36:31,840 INFO org.apache.hadoop.hbase.master.ServerManager:
>> 4
>> region servers, 0 dead, average load 48.75
>> 2009-08-14 13:36:32,016 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> RegionManager.metaScanner scanning meta region {server: 192.168.33.5:60020
>> ,
>> regionnam
>> e: .META.,,1, startKey: <>}
>> 2009-08-14 13:36:32,076 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> RegionManager.rootScanner scanning meta region {server: 192.168.33.6:60020
>> ,
>> regionnam
>> e: -ROOT-,,0, startKey: <>}
>> 2009-08-14 13:36:32,084 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> RegionManager.rootScanner scan of 1 row(s) of meta region {server:
>> 192.168.33.6:60020
>> , regionname: -ROOT-,,0, startKey: <>} complete
>> 2009-08-14 13:36:32,316 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> RegionManager.metaScanner scan of 193 row(s) of meta region {server:
>> 192.168.33.5:600
>> 20, regionname: .META.,,1, startKey: <>} complete
>> 2009-08-14 13:36:32,316 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> All
>> 1 .META. region(s) scanned
>> 2009-08-14 13:37:00,366 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x22313002be80001 to sun.nio.ch.selectionkeyi...@4a407c9f
>> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
>> lim=4 cap=4]
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
>> 2009-08-14 13:37:00,881 INFO org.apache.zookeeper.ClientCnxn: Attempting
>> connection to server ubuntu3/192.168.33.8:2222
>> 2009-08-14 13:37:04,366 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x22313002be80000 to sun.nio.ch.selectionkeyi...@4ac6ee33
>> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
>> lim=4 cap=4]
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
>> 2009-08-14 13:37:04,721 INFO org.apache.zookeeper.ClientCnxn: Attempting
>> connection to server ubuntu2/192.168.33.9:2222
>> 2009-08-14 13:37:08,872 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x22313002be80001 to sun.nio.ch.selectionkeyi...@2e93ebe0
>> java.io.IOException: TIMED OUT
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
>> 2009-08-14 13:37:08,873 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>> exception during shutdown output
>> java.net.SocketException: Transport endpoint is not connected
>>         at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>         at
>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
>>         at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
>> 2009-08-14 13:37:09,486 INFO org.apache.zookeeper.ClientCnxn: Attempting
>> connection to server ubuntu2/192.168.33.9:2222
>> 2009-08-14 13:37:12,712 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x22313002be80000 to sun.nio.ch.selectionkeyi...@7162d703
>> java.io.IOException: TIMED OUT
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
>> 2009-08-14 13:37:12,713 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>> exception during shutdown output
>> java.net.SocketException: Transport endpoint is not connected
>>         at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>         at
>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
>>         at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
>> 2009-08-14 13:37:13,032 INFO org.apache.zookeeper.ClientCnxn: Attempting
>> connection to server ubuntu3/192.168.33.8:2222
>> 2009-08-14 13:37:17,482 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x22313002be80001 to sun.nio.ch.selectionkeyi...@1012401d
>> java.io.IOException: TIMED OUT
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
>> 2009-08-14 13:37:17,483 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>> exception during shutdown output
>> java.net.SocketException: Transport endpoint is not connected
>>         at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>         at
>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
>>         at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
>> 2009-08-14 13:37:17,856 INFO org.apache.zookeeper.ClientCnxn: Attempting
>> connection to server ubuntu7/192.168.33.6:2222
>> 2009-08-14 13:37:19,445 INFO org.apache.zookeeper.ClientCnxn: Priming
>> connection to java.nio.channels.SocketChannel[connected local=/
>> 192.168.33.7:40923 remote
>> =ubuntu7/192.168.33.6:2222]
>> 2009-08-14 13:37:19,445 INFO org.apache.zookeeper.ClientCnxn: Server
>> connection successful
>> 2009-08-14 13:37:21,022 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x22313002be80000 to sun.nio.ch.selectionkeyi...@2e101b3a
>> java.io.IOException: TIMED OUT
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
>> 2009-08-14 13:37:21,023 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>> exception during shutdown output
>> java.net.SocketException: Transport endpoint is not connected
>>         at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>         at
>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
>>         at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
>> 2009-08-14 13:37:21,908 INFO org.apache.zookeeper.ClientCnxn: Attempting
>> connection to server ubuntu7/192.168.33.6:2222
>> 2009-08-14 13:37:21,908 INFO org.apache.zookeeper.ClientCnxn: Priming
>> connection to java.nio.channels.SocketChannel[connected local=/
>> 192.168.33.7:40926 remote
>> =ubuntu7/192.168.33.6:2222]
>> 2009-08-14 13:37:21,909 INFO org.apache.zookeeper.ClientCnxn: Server
>> connection successful
>> 2009-08-14 13:37:21,911 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x22313002be80000 to sun.nio.ch.selectionkeyi...@6bdfe124
>> java.io.IOException: Session Expired
>>         at
>>
>> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:548)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:661)
>>         at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
>> 2009-08-14 13:37:21,912 ERROR org.apache.hadoop.hbase.master.HMaster:
>> Master
>> lost its znode, killing itself now
>> Regards,
>> LvZheng
>>
>

Reply via email to