[
https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204268#comment-13204268
]
stack commented on HBASE-5353:
------------------------------
bq. ...you only need to look two places if you have an issue. If you have no
idea where the master is, you have to hunt around the cluster to find it.
I'd imagine it'd be hard getting this patch in if no idea where the master is
(And, again, don't we have this problem now if you start up three masters and
one fails? You have to hunt around. We need to build the redirect piece
regardless such as a link to master on each server page which redirects to
current master and such as a history of who was master when in zk).
You could even make the combined master+regionserver daemon work like our
current multimaster system by having there be affinity for a certain set of
servers.
What kind of nagios alerts would be master particular? We need to add
indirection to these now anyways -- ask zk who the master is -- if more than
one master running. Metrics could be a little complicated especially if master
moved servers over the period of interest but generally aren't master metrics
of less interest since they are generally just aggregates and ganglia or
opentsdb do it better job of this anyways?
Logs don't have to be interleaved. Thats just a bit of log4j config?
Yes, could be issue if the daemon is bogged down. The master would be less
responsive which should be fine for short periods but if sustained it could be
issue.
I'm not going to work on this. I do see it as something that could simplify
our deploy story.
> HA/Distributed HMaster via RegionServers
> ----------------------------------------
>
> Key: HBASE-5353
> URL: https://issues.apache.org/jira/browse/HBASE-5353
> Project: HBase
> Issue Type: Improvement
> Components: master, regionserver
> Affects Versions: 0.94.0
> Reporter: Jesse Yates
> Priority: Minor
>
> Currently, the HMaster node(s) must be considered a 'special' node (though
> not a single point of failover), meaning that the node must be protected more
> than the other cluster machines or at least specially monitored. Minimally,
> we always need to ensure that the master is running, rather than letting the
> system handle that internally. It should be possible to instead have the
> HMaster be much more available, either in a distributed sense (meaning a bit
> rewrite) or multiple, dynamically created instances combined with the hot
> fail-over of masters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira