[ceph-users] Re: OSDs fail to start after stopping them with ceph osd stop command

Stefan Hanreich Fri, 27 Jan 2023 06:37:31 -0800

Seems like I accidentally only replied directly to Eugen, so here is myanswer in case anyone encounters the same problem:

We were able to reproduce this issue, this is related to OSDs notcatching up to the current epoch of the OSD map.

For the first few OSDs, restarting twice worked well, but then the OSDmap of the cluster advanced too many epochs in the meanwhile, so therewere a lot of restarts required for the last few OSDs (around 30 if iremember correctly).

It seems like for some reason the OSDs only advance 40 epochs on startupand then terminate. This caused the problem since some OSDs were ataround epoch 9500 while the OSD map of the cluster already was at 12000or something. So we needed to restart them around 30 times, before theyfinally caught up to the cluster state and started working again.

I wanted to do a more detailed writeup soon, but didn't get around to ityet sadly.

Eugen also thankfully pointed out to me that there is the configurationvalue 'osd_map_share_max_epochs' with default value 40 that seems togovern this behavior. Hopefully I will find some time next week to lookinto this, but this looks very promising at first glance.


Kind Regards
Stefan

On 1/27/23 09:11, Eugen Block wrote:

Hi,
what ceph version is this cluster running on? I tried the procedureyou describe in a test cluster with 16.2.9 (cephadm) and all OSDs cameup, although I had to start the containers twice (manually).
Regards,
Eugen

Zitat von Stefan Hanreich <s.hanre...@proxmox.com>:
We encountered the following problems while trying to performmaintenance on a Ceph cluster:
The cluster consists of 7 Nodes with 10 OSDs each.
There are 4 pools on it: 3 of them are replicated pools with 3/2size/min_size and one is an erasure coded pool with m=2 and k=5.
The following global flags were set:

 * noout
 * norebalance
 * nobackfill
 * norecover
Then, after those flags were set, all OSDs were stopped via thecommand ceph osd stop, which seems to have caused the issue.
After maintenance was done, all OSDs were started again viasystemctl. Only about half of the 70 OSDs in total started at first -while the other half started, but got killed after a few seconds withthe following log messages:
ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff3fcf8d700 -1 osd.5112161 map says i am stopped by admin. shutting down.ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1received signal: Interrupt from Kernel ( Could be generated bypthread_kill(), raise(), abort(), alarm() ) UID: 0ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 osd.5112161 *** Got signal Interrupt ***ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 osd.5112161 *** Immediate shutdown (osd_fast_shutdown=true) ***
And indeed, when looking into the osd map via ceph osd dump, theremaining OSDs seem to be marked as stopped:
osd.50 down out weight 0 up_from 9213 up_thru 9416 down_at 9760last_clean_interval [9106,9207)[v2:10.0.1.61:6813/6211,v1:10.0.1.61:6818/6211][v2:10.0.0.61:6814/6211,v1:10.0.0.61:6816/6211] exists,stop9a2590c4-f50b-4550-bfd1-5aafb543cb59
We were able to restore some of the remaining OSDs via running

ceph out osd XX
ceph in osd XX
and then starting the service again (via systemctl start). This didwork for most OSDs, except for the OSDs that are located on onespecific host. Some OSDs required several restarts until they did notkill themselves a few seconds after starting.
This whole issue seems to be caused by the OSDs being marked asstopped in the OSD map [1]. Apparently this state should get resetwhen re-starting the OSD again [2], but for some reason this doesn'thappen for some of the OSDs. This behavior seems to have beenintroduced via the following pull request [3]. We have also found thefollowing commit where the logic regarding stop seemed to have beenintroduced [4].
We were looking into commands that reset the stopped status of theOSD in the OSD map, but did not find any way of forcing this.
Since we are out of ideas on how to proceed with the remaining 10OSDs that cannot get brought up: How does one recover from thissituation? It seems like by running ceph osd stop the cluster got ina state that seems irrecoverable with the normal CLI commandsavailable. We even looked into the possibility of manuallymanipulating the osdmap via the osdmaptool, but there doesn't seem tobe a way to edit the start/stopped status and it also seems like avery invasive procedure. There does not seem to be any way we can seeof recovering from this, apart from rebuilding all the OSDs - whichwe refrained from for now.
Kind Regards
Hanreich Stefan
[1]https://github.com/ceph/ceph/blob/63a77b2c5b683cb241f865daec92c046152175b4/src/osd/OSD.cc#L8240
[2]https://github.com/ceph/ceph/blob/63a77b2c5b683cb241f865daec92c046152175b4/src/osd/OSDMap.cc#L2353
[3] https://github.com/ceph/ceph/pull/43664
[4]https://github.com/ceph/ceph/commit/5dbae13ce0f5b0104ab43e0ccfe94f832d0e1268
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSDs fail to start after stopping them with ceph osd stop command

Reply via email to