[ceph-users] Osds going down/flapping after Luminous to Nautilus upgrade part 2

Mark Kirkwood Wed, 31 Jul 2024 17:56:55 -0700

This 2nd post is about the next type of flapping osds we encounteredafter upgrading. We started to see osds going down with this in 'ceph -w':

2024-08-01 12:02:57.437135 mon.cat-hlz-stor001 [INF] osd.479 marked downafter no beacon for 902.637005 seconds2024-08-01 12:02:57.468372 mon.cat-hlz-stor001 [WRN] Health checkfailed: 1 osds down (OSD_DOWN)


We have the beacon interval set to 300. To fix this we tried:

- restarting osds
- restarting mons
- ntp tidyup
- restarting mgrs

However it is still happening. Poking around in the osd and mon logs wedid see some lines that hinted that the mon might be listening forbeacons using v1 - which could be broken (see part 1). Hence restartingthem again. This did not have any effect.

Apart from enabling the v2 msgr we have not altered our Luminous configfor Nautilus, are we missing something?


Regards

Mark
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Osds going down/flapping after Luminous to Nautilus upgrade part 2

Reply via email to