Although I'm not a dev, I looked into the code [0] anyway.

The comments before the maybe_resize_cluster function say:

 * If a cluster is undersized (with respect to max_mds), then
 * attempt to find daemons to grow it. If the cluster is oversized
 * (with respect to max_mds) then shrink it by stopping its highest rank.

Is it possible that an operator/admin tried to resize (shrink or grow the number of MDS daemons) the MDS culster? Or was a DeepSea stage executed in order to deploy additional daemons? Maybe some history could help understand what might have happened.

[0] https://github.com/ceph/ceph/blob/v14.2.22/src/mon/MDSMonitor.cc#L1801

Zitat von Eugen Block <ebl...@nde.ag>:

That does look strange indeed, either an upgrade went wrong or someone already fiddled with the monmap, I'd say. But anyway, I wouldn't try to deploy a 4th mon since it would want to sync the store, but we don't know in which state the store actually is. And besides from that, 2 out of 4 MONs still isn't a quorum, so there's no real benefit. So my best bet would be on the mon with the most recent store. And if the cluster comes back up with one mon, you'll need to wipe the traces of the previous mons so DeepSea can redeploy additional mons cleanly. Or is the cluster not managed by DeepSea anymore?

Zitat von Miles Goodhew <c...@m0les.com>:

Eugen,
Sorry, I forgot to add that this is what the monmap looks like now (IPs/names sanitised):

```
min_mon_release 14 (nautilus)
0: [v2:IP_MON3:3300/0,v1:IP_MON3:6789/0] mon3
1: v1:IP_MON1:6789/0 mon1
2: v1:IP_MON2:6789/0 mon2
```

Not sure why mon3 has the v2 + v1 setup and mon1/2 don't

Thanks again,

M0les.

On Wed, 18 Jun 2025, at 17:42, Miles Goodhew wrote:
Hi Eugen,
 Thanks for your response.

Out of interest things that I've done overnight are stopping all the daemons (OSDs and RGWs were the ones still running) - so I'm just dealing with the 3 MONs now. Trying different start-sequences, I can determine:

* mon3 was the last one working
* Starting mon1 will kill mon3 (and prevent it starting) with that crash mentioned in the original email
* Similarly starting mon2 will kill both mon1 and mon3 in the same way
* Only mon3 gets the fast spamming of "e6 handle_auth_request failed to assign global_id" log messages when it's running.
* Dumping the monmap results in the same file on all 3 mons.

As for your suggestion of reducing the monmap to 1 node and rebuilding, we were also thinking of heading down that path. I'm hoping that deploying a temporary 4th mon on a new node might be able to get two nodes running (without killing the "old" one). Probably using mon3 because it's likely the most up-to-date. If that works, we could try clobbering and redeploying the other two "old" mon daemons and removing the temporary one to get back to the original 3 mons. As you say: using their original IP addresses (one of the clients is Openstack/RBD, which can be sentimental about mon IPs).

I'm just in a bit of decision paralysis about which mon to take as the survivor. All can run _individually_, but only mon2 will survive a group start. mon3 was the last one working, but it has the mysterious "failed to assign global ID" errors. I'm leaning toward using mon3.. or mon2.

Thanks for listening,

M0les.


On Wed, 18 Jun 2025, at 17:04, Eugen Block wrote:
Hi,

correct, SUSE's Ceph product was Salt-based, in this case 14.2.22 was
shipped with SES 6. ;-)

Do you also have some mon logs from right before the crash, maybe with
a higher debug level? It could make sense to stop client traffic and
OSDs as well to be able to recover. But unfortunately, I can't really
comment on the stack trace.

Maybe someone has a different idea, but if you get one MON up, I would
probably reduce the monmap to 1 MON to bring the cluster back up. Back
up all the MON stores, just in case you have to start over. Then
extract the monmap, remove all but one, and inject the modified monmap
into the MON you want to revive. The procedure is described here [0].
Just don't change the address but only reduce the monmap. ;-)

Regards,
Eugen

[0]
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method

Zitat von Miles Goodhew <c...@m0les.com>:

Hi,
  I've been called-in by a client with an ancient SUSE-based Ceph
Nautilus (14.2.22) who's MONs keep dieing oddly.
  Apparently the issue started with MDS daemons not working and
eventuallt a MON restart killed the cluster.

OS: SLES 15-SP1 (out of support)
Ceph: 14.2.22 "Nautilus" (Deployed with Salt... I think)
3 MONs; 5 MDSs; 3 MGRs; 4 RGWs; 336 OSDs on 21 nodes.
Client services: "One of everything at least", but RBD/Openstack,
S3/RGW and CephFS are big ones.

  After sorting out some of the logs here are some things I know:
Disk space, RAM availability, inodes and network connectivity seem
OK to me. After shutting-down all the MONs, MGRs and MDSes, one MON
can usually be started, but it sits there spamming-out log messages
like "[SERVICE_ID](probing) e6 handle_auth_request failed to assign
global_id" (maybe 50 - 100 times per second). All the while the
syslog shows 'e6 get_health_metrics reporting [INCREASING_NUMBER]
slow ops` fairly often. This is probably due to OSDs and clients
being active.

  If I restart one of the other MONs, the running one will die with
a stack trace at (Limiting to C++/library internal calls):

```
8: (std::__throw_out_of_range(char const*)+0x41) [0x7f2a5983fa07]
9: (MDSMonitor::maybe_resize_cluster(FSMap&, int)+0xcf0) [0x55b441e37490]
10: (MDSMonitor::tick()+0xc9) [0x55b441e38ce9]
11: (MDSMonitor::on_active()+0x28) [0x55b441e22fa8]
12: (PaxosService::_active()+0xdd) [0x55b441d7188d]
13: (Context::complete(int)+0x9) [0x55b441c888a9]
14: (void finish_contexts<std::__cxx11::list<Context*,
std::allocator<Context*> > >(CephContext*,
std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0xa8)
[0x55b441cb2408]
15: (Paxos::finish_round()+0x76) [0x55b441d681b6]
16: (Paxos::handle_last(boost::intrusive_ptr<MonOpRequest>)+0xc1f)
[0x55b441d693df]
17: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x233)
[0x55b441d69e23]
18:
(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x1668)
[0x55b441c820b8]
19: (Monitor::_ms_dispatch(Message*)+0xa3a) [0x55b441c82b5a]
20: (Monitor::ms_dispatch(Message*)+0x26) [0x55b441cb3646]
21: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message>
const&)+0x26) [0x55b441cb00b6]
22: (DispatchQueue::entry()+0x1279) [0x7f2a5b188379]
23: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f2a5b238a5d]
24: (()+0x8539) [0x7f2a59db7539]
25: (clone()+0x3f) [0x7f2a58f87ecf]
```

Anyone got any clues about how to diagnose or better-yet repair this?

Sorry, I know this is a bit half-baked, but I'm trying to dump this
help request at COB to see if I can hook anyone's interest overnight.

Thanks for at least reading this far,

M0les.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to