I think we should add a conf file for "backupmasters", or just use
"masters" but with the first in the list the one that always gets to be
master first (introducing a delay should ensure he gets the ephemeral
node first?).
Should not be too bad? If it's hard we could wait but seems like it
would be fairly simple.
JG
Jean-Daniel Cryans wrote:
Rong-En Fan,
I agree multi-master requires manual tasks and the current lack of doc
does not help (it's on my list tho).
I also agree that stop on a backup master shouldn't stop the cluster.
Can you fill in a Jira? (kill -9 works well btw)
wrt multi-master conf, I personally ruled it out of 0.20.0 but do you
think we should still include it for usability? Is it currently too
rough?
Thx,
J-D
On Thu, Jul 16, 2009 at 12:40 PM, Rong-en Fan<[email protected]> wrote:
Few days ago, I played with the latest trunk to see how fail-tolerance
works in 0.20. While running PerformanceEvaluation to generate
workloads, killing HRS and HMaster is not a big deal. The client
recovers after tens of secs to few minutes. This is good.
For multi masters, it seems that I have to manually start backup master by
bin/hbase-daemon.sh start master
This is ok, though it's better that we can specify this as part of
hbase-site.xml or a new conf/masters.
But stop backup master is messy... if I just do
bin/hbase-daemon.sh stop master
It will bring the whole cluster down. That's bad.
Not sure if we can do something like this :
1. if there is an active master, stop master will just make HMaster
die without shutdown the whole cluster
2. otherwise, shutdown the whole cluster as before
Any ideas?
Thanks,
Rong-En Fan