[ 
https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204268#comment-13204268
 ] 

stack commented on HBASE-5353:
------------------------------

bq. ...you only need to look two places if you have an issue.   If you have no 
idea where the master is, you have to hunt around the cluster to find it.

I'd imagine it'd be hard getting this patch in if no idea where the master is 
(And, again, don't we have this problem now if you start up three masters and 
one fails?  You have to hunt around.  We need to build the redirect piece 
regardless such as a link to master on each server page which redirects to 
current master and such as a history of who was master when in zk).

You could even make the combined master+regionserver daemon work like our 
current multimaster system by having there be affinity for a certain set of 
servers.

What kind of nagios alerts would be master particular?  We need to add 
indirection to these now anyways -- ask zk who the master is -- if more than 
one master running.  Metrics could be a little complicated especially if master 
moved servers over the period of interest but generally aren't master metrics 
of less interest since they are generally just aggregates and ganglia or 
opentsdb do it better job of this anyways?

Logs don't have to be interleaved.  Thats just a bit of log4j config?

Yes, could be issue if the daemon is bogged down.  The master would be less 
responsive which should be fine for short periods but if sustained it could be 
issue.

I'm not going to work on this.  I do see it as something that could simplify 
our deploy story.





 




                
> HA/Distributed HMaster via RegionServers
> ----------------------------------------
>
>                 Key: HBASE-5353
>                 URL: https://issues.apache.org/jira/browse/HBASE-5353
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jesse Yates
>            Priority: Minor
>
> Currently, the HMaster node(s) must be considered a 'special' node (though 
> not a single point of failover), meaning that the node must be protected more 
> than the other cluster machines or at least specially monitored. Minimally, 
> we always need to ensure that the master is running, rather than letting the 
> system handle that internally. It should be possible to instead have the 
> HMaster be much more available, either in a distributed sense (meaning a bit 
> rewrite) or multiple, dynamically created instances combined with the hot 
> fail-over of masters. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to