[
https://issues.apache.org/jira/browse/MESOS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264707#comment-15264707
]
Priyanka Gupta commented on MESOS-5193:
---------------------------------------
[~bmahler] Zookeeper connectivity issues are because we have zk also setup on
the same nodes as mesos master. So configuration wise, we have 3 nodes, each
running zk, mesos-master and mesos-slave. As far as restart is concerned, we
have rhel6 boxes and have a initd service which runs these. Although, once a
master process gets killed the service gets terminated as well.
> Recovery failed: Failed to recover registrar on reboot of mesos master
> ----------------------------------------------------------------------
>
> Key: MESOS-5193
> URL: https://issues.apache.org/jira/browse/MESOS-5193
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 0.22.0, 0.27.0
> Reporter: Priyanka Gupta
> Labels: master, mesosphere
> Attachments: node1.log, node1_after_work_dir.log, node2.log,
> node2_after_work_dir.log, node3.log, node3_after_work_dir.log
>
>
> Hi all,
> We are using a 3 node cluster with mesos master, mesos slave and zookeeper on
> all of them. We are using chronos on top of it. The problem is when we reboot
> the mesos master leader, the other nodes try to get elected as leader but
> fail with recovery registrar issue.
> "Recovery failed: Failed to recover registrar: Failed to perform fetch within
> 1mins"
> The next node then try to become the leader but again fails with same error.
> I am not sure about the issue. We are currently using mesos 0.22 and also
> tried to upgrade to mesos 0.27 as well but the problem continues to happen.
> /usr/sbin/mesos-master --work_dir=/tmp/mesos_dir
> --zk=zk://node1:2181,node2:2181,node3:2181/mesos --quorum=2
> Can you please help us resolve this issue as its a production system.
> Thanks,
> Priyanka
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)