Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/MultipleMasters

------------------------------------------------------------------------------
  
  == Single Master Setup ==
  
- The 
[http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description
 Getting Started] documentation gets you into that state. A failure of the 
Master server will not be damageable in the first few minutes but your regions 
will be unable to split. If it does happen, you can go on any other machine 
with the correct installation/configuration and do {{{$ 
${HBASE_HOME}/bin/hbase-daemon.sh start master}}}. Currently the Hadoop 
Distributed Filesystem is '''not''' highly available so if the Namenode resides 
on the same machine is your Master, the cluster is still wedged and you will 
have to shut down HBase with a high probability of losing data.
+ The 
[http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description
 Getting Started] documentation describes set up with a single Master. A 
cluster can run without a Master for a number of few minutes but your regions 
will be unable to split. If it does happen, you can go to any other machine 
with the correct installation/configuration and do {{{$ 
${HBASE_HOME}/bin/hbase-daemon.sh start master}}}.  This newly started Master 
will take over Master functions.
+ 
+ Currently the Hadoop Distributed Filesystem is '''not''' highly available so 
if the Namenode resides on the same machine is your Master, the cluster is 
still wedged and you will have to shut down HBase with a high probability of 
losing data.
  
  == Multiple Masters Setup ==
  
- Before setting up multiple Masters, you should already have built an HBase 
cluster with a single Master. If not, please refer to the Getting Started 
documentation.
+ Before setting up multiple Masters, you should already have built an HBase 
cluster with a single Master. If not, please refer to the 
[http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description
 Getting Started] documentation.
  
  === Basic knowledge ===
  
- The multi-master feature introduced in 0.20 does not add the cooperation of a 
score of Masters, there is still just one working Master while the other 
''backups'' wait. For example, if you start 200 Masters only 1 will be active 
while the others wait for it to die. The switch usually takes 
zookeeper.session.timeout plus a couple of seconds to occur. See "How it works 
inside" for more information.
+ The multi-master feature introduced in 0.20 does not add cooperating Masters; 
there is still just one working Master while the other ''backups'' wait. For 
example, if you start 200 Masters only 1 will be active while the others wait 
for it to die. The switch usually takes zookeeper.session.timeout plus a couple 
of seconds to occur. See "How it works inside" below for more information.
  
  === Designing your highly available setup ===
  
- The rule of thumb here is to not put all your eggs in the same basket. You 
don't want a Namenode and a Master on the same machine because currently you 
can recover automatically from a Master failure but from a Namenode failure. Be 
sure that the Namenode has its own very reliable machine until Hadoop 0.21 
comes in with ''Backup Namenodes''. Also you don't want to have a Region Server 
with a Master as that machine failure will imply first a Master failover and 
then the new Master will have to split the logs of the failed RS. 
+ The rule of thumb here is to not put all your eggs in the same basket. You 
don't want a Namenode and a Master on the same machine because currently you 
can recover automatically from a Master failure but not from a Namenode 
failure. Be sure that the Namenode has its own very reliable machine until 
Hadoop 0.21 comes in with ''Backup Namenodes''. Also you don't want to have a 
Region Server and a Master on the same node, as that machine failure will imply 
first a Master failover and then the new Master will have to split the logs of 
the failed RS. 
  
  Your ideal highly available cluster would have 5 or more dedicated Zookeeper 
servers, 2-3 dedicated Master servers (one per rack for example), 1 very 
reliable Namenode/Job Tracker server with redundant hardware and the rest is 
the usual Datanode/Task Tracker/Region Server stack. If you don't even have 
twice that amount of machines, you will have to evaluate some trade-offs. For 
example, you could try to keep a dedicated Master server and put the others 
along the Region Servers as the failure of a backup Master doesn't have any 
impact and you could do the same for the ZK servers.
  
@@ -26, +28 @@

  
  Currently handling the other Masters isn't really user friendly but it's 
getting worked on. When you start HBase, your first main Master will also be 
started. To start other Masters do {{{$ ${HBASE_HOME}/bin/hbase-daemon.sh start 
master}}} on all the nodes you want to, as long as the have the correct 
installation/configuration. You could also do {{{$ 
${HBASE_HOME}/bin/hbase-daemons.sh start master}}} and that would start a 
Master on every machine listed in ­­{{{conf/regionserver}}}.
  
- To stop any Master without shutting down HBase, you currently have to {{{kill 
-9}}} it. If you kill the active Master, first make sure it's not splitting 
logs as you could lose data. To check that, tail the Master's log and watch for 
anything that says "Splitting logs # of #". 
+ To stop any Master '''without shutting down HBase''', you currently have to 
{{{kill -9}}} it (This is OK.  All state is maintained elsewhere off in 
ZooKeeper and out on RegionServers). If you kill the active Master, first make 
sure it's not splitting logs as you could lose data. To check that, tail the 
Master's log and watch for anything that says "Splitting logs # of #". 
  
  == How it works inside ==
  

Reply via email to