Hi Huang, Thanks for offering to help but this original issue with the ceph-mon's not connecting already got diagnosed as a possible networking error at the hardware level last week. We originally removed all the mons except one to force it to come online without waiting for a quorum, and the networking was diagnosed and fixed after that was implemented. We are pretty sure it was ultimately the result of aging hardware being pushed to its limit with the rebuilding and repairing and several changes I made in a short time period.
On Tue, Sep 3, 2019 at 9:41 PM huang jun <[email protected]> wrote: > can you set debug_mon=20 and debug_paxos=20 and debug_ms=1 on all mon > and get log? > > Ashley Merrick <[email protected]> 于2019年9月3日周二 下午9:35写道: > > > > What change did you make in ceph.conf > > > > Id check that hasn't caused an issue first. > > > > > > ---- On Tue, 27 Aug 2019 04:37:15 +0800 [email protected] wrote ---- > > > > Hello, > > > > I have an old ceph 0.94.10 cluster that had 10 storage nodes with one > extra management node used for running commands on the cluster. Over time > we'd had some hardware failures on some of the storage nodes, so we're down > to 6, with ceph-mon running on the management server and 4 of the storage > nodes. We attempted deploying a ceph.conf change and restarted ceph-mon and > ceph-osd services, but the cluster went down on us. We found all the > ceph-mons are stuck in the electing state, I can't get any response from > any ceph commands but I found I can contact the daemon directly and get > this information (hostnames removed for privacy reasons): > > > > root@<mgmt1>:~# ceph daemon mon.<mgmt1> mon_status > > { > > "name": "<mgmt1>", > > "rank": 0, > > "state": "electing", > > "election_epoch": 4327, > > "quorum": [], > > "outside_quorum": [], > > "extra_probe_peers": [], > > "sync_provider": [], > > "monmap": { > > "epoch": 10, > > "fsid": "69611c75-200f-4861-8709-8a0adc64a1c9", > > "modified": "2019-08-23 08:20:57.620147", > > "created": "0.000000", > > "mons": [ > > { > > "rank": 0, > > "name": "<mgmt1>", > > "addr": "[fdc4:8570:e14c:132d::15]:6789\/0" > > }, > > { > > "rank": 1, > > "name": "<mon1>", > > "addr": "[fdc4:8570:e14c:132d::16]:6789\/0" > > }, > > { > > "rank": 2, > > "name": "<mon2>", > > "addr": "[fdc4:8570:e14c:132d::28]:6789\/0" > > }, > > { > > "rank": 3, > > "name": "<mon3>", > > "addr": "[fdc4:8570:e14c:132d::29]:6789\/0" > > }, > > { > > "rank": 4, > > "name": "<mon4>", > > "addr": "[fdc4:8570:e14c:132d::151]:6789\/0" > > } > > ] > > } > > } > > > > > > Is there any way to force the cluster back into a quorum even if it's > just one mon running to start it up? I've tried exporting the mgmt's monmap > and injecting it into the other nodes, but it didn't make any difference. > > > > Thanks! > > _______________________________________________ > > ceph-users mailing list -- [email protected] > > To unsubscribe send an email to [email protected] > > > > > > _______________________________________________ > > ceph-users mailing list -- [email protected] > > To unsubscribe send an email to [email protected] >
_______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
