Just nuke the monitor's store, remove it from the existing quorum, and
start over again. Injecting maps correctly is non-trivial and obviously
something went wrong, and re-syncing a monitor is pretty cheap.

On Thu, Jun 20, 2019 at 6:46 AM ☣Adam <[email protected]> wrote:

> Anyone have any suggestions for how to troubleshoot this issue?
>
>
> -------- Forwarded Message --------
> Subject: Monitor stuck at "probing"
> Date: Fri, 14 Jun 2019 21:40:39 -0500
> From: ☣Adam <[email protected]>
> To: [email protected]
>
> I have a monitor which I just can't seem to get to join the quorum, even
> after injecting a monmap from one of the other servers.[1]  I use NTP on
> all servers and also manually verified the clocks are synchronized.
>
>
> My monitors are named: ceph0, ceph2, xe, and tc
>
> I'm transitioning away from the ceph# naming scheme, so please forgive
> the confusing [lack of a] naming convention.
>
>
> The relevant output from: ceph -s
> 1/4 mons down, quorum ceph0,ceph2,xe
> mon: 4 daemons, quorum ceph0,ceph2,xe, out of quorum: tc
>
>
> tc is up, bound to the expected IP address, and the ceph-mon service can
> be reached from xe, ceph0 and ceph2 using telnet.  The mon_host and
> mon_initial_members from `ceph daemon mon.tc config show` look correct.
>
> mon_status on tc shows the state as "probing" and the list of
> "extra_probe_peers" looks correct (correct IP addresses, and ports).
> However the monmap section looks wrong.  The "mons" has all 4 servers,
> but the addr and public_addr values are 0.0.0.0:0.  Furthermore it says
> the monmap epoch is 4.  I don't understand why because I just injected a
> monmap which has an epoch of 7.
>
> Here's the output of: monmaptool --print ./monmap
> monmaptool: monmap file ./monmap
> epoch 7
> fsid a690e404-3152-4804-a960-8b52abf3bd65
> last_changed 2019-06-02 17:38:50.161035
> created 2018-12-28 20:26:41.443339
> 0: 192.168.60.10:6789/0 mon.ceph0
> 1: 192.168.60.11:6789/0 mon.tc
> 2: 192.168.60.12:6789/0 mon.ceph2
> 3: 192.168.60.53:6789/0 mon.xe
>
> When I injected it, I stopped ceph-mon, ran:
> sudo ceph-mon -i tc --inject-monmap ./monmap
>
> and started ceph-mon again.  I then rebooted to see if it would fix this
> epoch/addr issue.  It did not.
>
> I'm attaching what I believe is the relevant section of my log file from
> the tc monitor.  I ran `ceph auth list` on tc and ceph2 and verified
> that the output is identical.  This check was based on what I saw in the
> log and what I read in a blog post.[2]
>
> What are the next steps in troubleshooting this issue?
>
>
> Thanks,
> Adam
>
>
> [1]
> http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-mon/
> [2]
>
> https://medium.com/@george.shuklin/silly-mistakes-with-ceph-mon-9ef6c9eaab54
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to