Few days ago, I played with the latest trunk to see how fail-tolerance works in 0.20. While running PerformanceEvaluation to generate workloads, killing HRS and HMaster is not a big deal. The client recovers after tens of secs to few minutes. This is good.
For multi masters, it seems that I have to manually start backup master by bin/hbase-daemon.sh start master This is ok, though it's better that we can specify this as part of hbase-site.xml or a new conf/masters. But stop backup master is messy... if I just do bin/hbase-daemon.sh stop master It will bring the whole cluster down. That's bad. Not sure if we can do something like this : 1. if there is an active master, stop master will just make HMaster die without shutdown the whole cluster 2. otherwise, shutdown the whole cluster as before Any ideas? Thanks, Rong-En Fan
