[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

Eugen Block Wed, 18 Jun 2025 05:18:23 -0700

Although I'm not a dev, I looked into the code [0] anyway.


The comments before the maybe_resize_cluster function say:

 * If a cluster is undersized (with respect to max_mds), then
 * attempt to find daemons to grow it. If the cluster is oversized
 * (with respect to max_mds) then shrink it by stopping its highest rank.

Is it possible that an operator/admin tried to resize (shrink or growthe number of MDS daemons) the MDS culster? Or was a DeepSea stageexecuted in order to deploy additional daemons? Maybe some historycould help understand what might have happened.


[0] https://github.com/ceph/ceph/blob/v14.2.22/src/mon/MDSMonitor.cc#L1801

Zitat von Eugen Block <ebl...@nde.ag>:

That does look strange indeed, either an upgrade went wrong orsomeone already fiddled with the monmap, I'd say. But anyway, Iwouldn't try to deploy a 4th mon since it would want to sync thestore, but we don't know in which state the store actually is. Andbesides from that, 2 out of 4 MONs still isn't a quorum, so there'sno real benefit. So my best bet would be on the mon with the mostrecent store. And if the cluster comes back up with one mon, you'llneed to wipe the traces of the previous mons so DeepSea can redeployadditional mons cleanly. Or is the cluster not managed by DeepSeaanymore?


Zitat von Miles Goodhew <c...@m0les.com>:

Eugen,

Sorry, I forgot to add that this is what the monmap looks like now(IPs/names sanitised):


```
min_mon_release 14 (nautilus)
0: [v2:IP_MON3:3300/0,v1:IP_MON3:6789/0] mon3
1: v1:IP_MON1:6789/0 mon1
2: v1:IP_MON2:6789/0 mon2
```

Not sure why mon3 has the v2 + v1 setup and mon1/2 don't

Thanks again,

M0les.

On Wed, 18 Jun 2025, at 17:42, Miles Goodhew wrote:

Hi Eugen,
 Thanks for your response.

Out of interest things that I've done overnight are stopping allthe daemons (OSDs and RGWs were the ones still running) - so I'mjust dealing with the 3 MONs now. Trying differentstart-sequences, I can determine:


* mon3 was the last one working

* Starting mon1 will kill mon3 (and prevent it starting) with thatcrash mentioned in the original email

* Similarly starting mon2 will kill both mon1 and mon3 in the same way

* Only mon3 gets the fast spamming of "e6 handle_auth_requestfailed to assign global_id" log messages when it's running.

* Dumping the monmap results in the same file on all 3 mons.

As for your suggestion of reducing the monmap to 1 node andrebuilding, we were also thinking of heading down that path. I'mhoping that deploying a temporary 4th mon on a new node might beable to get two nodes running (without killing the "old" one).Probably using mon3 because it's likely the most up-to-date. Ifthat works, we could try clobbering and redeploying the other two"old" mon daemons and removing the temporary one to get back tothe original 3 mons. As you say: using their original IP addresses(one of the clients is Openstack/RBD, which can be sentimentalabout mon IPs).

I'm just in a bit of decision paralysis about which mon to take asthe survivor. All can run _individually_, but only mon2 willsurvive a group start. mon3 was the last one working, but it hasthe mysterious "failed to assign global ID" errors. I'm leaningtoward using mon3.. or mon2.


Thanks for listening,

M0les.


On Wed, 18 Jun 2025, at 17:04, Eugen Block wrote:

Hi,

correct, SUSE's Ceph product was Salt-based, in this case 14.2.22 was
shipped with SES 6. ;-)

Do you also have some mon logs from right before the crash, maybe with
a higher debug level? It could make sense to stop client traffic and
OSDs as well to be able to recover. But unfortunately, I can't really
comment on the stack trace.

Maybe someone has a different idea, but if you get one MON up, I would
probably reduce the monmap to 1 MON to bring the cluster back up. Back
up all the MON stores, just in case you have to start over. Then
extract the monmap, remove all but one, and inject the modified monmap
into the MON you want to revive. The procedure is described here [0].
Just don't change the address but only reduce the monmap. ;-)

Regards,
Eugen

[0]
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method

Zitat von Miles Goodhew <c...@m0les.com>:

Hi,
  I've been called-in by a client with an ancient SUSE-based Ceph
Nautilus (14.2.22) who's MONs keep dieing oddly.
  Apparently the issue started with MDS daemons not working and
eventuallt a MON restart killed the cluster.

OS: SLES 15-SP1 (out of support)
Ceph: 14.2.22 "Nautilus" (Deployed with Salt... I think)
3 MONs; 5 MDSs; 3 MGRs; 4 RGWs; 336 OSDs on 21 nodes.
Client services: "One of everything at least", but RBD/Openstack,
S3/RGW and CephFS are big ones.

  After sorting out some of the logs here are some things I know:
Disk space, RAM availability, inodes and network connectivity seem
OK to me. After shutting-down all the MONs, MGRs and MDSes, one MON
can usually be started, but it sits there spamming-out log messages
like "[SERVICE_ID](probing) e6 handle_auth_request failed to assign
global_id" (maybe 50 - 100 times per second). All the while the
syslog shows 'e6 get_health_metrics reporting [INCREASING_NUMBER]
slow ops` fairly often. This is probably due to OSDs and clients
being active.

  If I restart one of the other MONs, the running one will die with
a stack trace at (Limiting to C++/library internal calls):

```
8: (std::__throw_out_of_range(char const*)+0x41) [0x7f2a5983fa07]
9: (MDSMonitor::maybe_resize_cluster(FSMap&, int)+0xcf0) [0x55b441e37490]
10: (MDSMonitor::tick()+0xc9) [0x55b441e38ce9]
11: (MDSMonitor::on_active()+0x28) [0x55b441e22fa8]
12: (PaxosService::_active()+0xdd) [0x55b441d7188d]
13: (Context::complete(int)+0x9) [0x55b441c888a9]
14: (void finish_contexts<std::__cxx11::list<Context*,
std::allocator<Context*> > >(CephContext*,
std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0xa8)
[0x55b441cb2408]
15: (Paxos::finish_round()+0x76) [0x55b441d681b6]
16: (Paxos::handle_last(boost::intrusive_ptr<MonOpRequest>)+0xc1f)
[0x55b441d693df]
17: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x233)
[0x55b441d69e23]
18:
(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x1668)
[0x55b441c820b8]
19: (Monitor::_ms_dispatch(Message*)+0xa3a) [0x55b441c82b5a]
20: (Monitor::ms_dispatch(Message*)+0x26) [0x55b441cb3646]
21: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message>
const&)+0x26) [0x55b441cb00b6]
22: (DispatchQueue::entry()+0x1279) [0x7f2a5b188379]
23: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f2a5b238a5d]
24: (()+0x8539) [0x7f2a59db7539]
25: (clone()+0x3f) [0x7f2a58f87ecf]
```

Anyone got any clues about how to diagnose or better-yet repair this?

Sorry, I know this is a bit half-baked, but I'm trying to dump this
help request at COB to see if I can hook anyone's interest overnight.

Thanks for at least reading this far,

M0les.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

Reply via email to