[
https://issues.apache.org/jira/browse/MESOS-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601362#comment-14601362
]
Craig W commented on MESOS-2934:
--------------------------------
I only tried standing up a 5 master cluster with quorum = 4 and it failed. I
then changed quorum to 3 and it worked. I did not trying growing from 3 to 5 or
to 7 masters.
> Mesos master crashes when quorum set to 4
> -----------------------------------------
>
> Key: MESOS-2934
> URL: https://issues.apache.org/jira/browse/MESOS-2934
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 0.22.1
> Environment: CentOS 7
> Java 1.7.0_55
> Reporter: Craig W
> Priority: Minor
> Labels: documentaion
>
> When deploying 5 mesos masters, with quorum set to 4, the masters start up
> but fail to stay running. Instead they exit and then restart (Monit is used
> to supervise the process) within a few seconds. This cycle continues non-stop.
> The logs on the master look like this:
> {noformat}
> Received a recover response from a replica in EMPTY status
> Received a recover response from a replica in EMPTY status
> Replica in EMPTY status received a broadcasted recover request
> Recovery failed: Failed to recover registrar: Failed to perform fetch within
> 1mins
> Replica in EMPTY status received a broadcasted recover request
> Received a recover response from a replica in EMPTY status
> Received a recover response from a replica in EMPTY status
> Replica in EMPTY status received a broadcasted recover
> The newly elected leader is master@<ip>:5050 with id
> 20150625-102436-748881418-5050-2157
> Elected as the leading master!
> Recovering from registrar
> Recovering registrar
> Unable to finish the recover protocol in 10secs, retrying
> Unable to finish the recover protocol in 10secs, retrying
> Recovery failed: Failed to recover registrar: Failed to perform fetch within
> 1mins
> {noformat}
> When I change the quorum to 2 and run just 3 mesos master processes, the
> cluster stays up without a hitch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)