This sounds like all your nodes are on a single switch, which for production is risky, for this reason and others.
If that’s the case, I suggest shutting down the cluster completely in advance, as described in the docs. > On May 29, 2022, at 9:10 PM, Jeremy Hansen <[email protected]> wrote: > > So in my experience so far, if I take out a switch after a firmware update > and a reboot of the switch, meaning all ceph nodes lose network connectivity > and communication with each other, Ceph becomes unresponsive and my only fix > up to this point has been to, one by one, reboot the compute nodes. Are you > saying I just need to wait? I don’t know how long I’ve waited in the past, > but if you’re saying at least 10 minutes, I probably haven’t waited that long. > > Thanks > -jeremy > >> On Sunday, May 29, 2022 at 3:40 PM, Tyler Stachecki >> <[email protected] (mailto:[email protected])> wrote: >> Ceph always aims to provide high availability. So, if you do not set cluster >> flags that prevent Ceph from trying to self-heal, it will self-heal. >> >> Based on your description, it sounds like you want to consider the 'noout' >> flag. By default, after 10(?) minutes of an OSD being down, Ceph will begin >> the process of outing the affected OSD to ensure high availability. >> >> But be careful, as far as latency goes -- you likely still want to >> pre-emptively mark OSDs down ahead of the planned maintenance for latency >> purposes, and you must be cognisant of whether or not your replication >> policy puts you in a position where an unrelated failure during the >> maintenance can result in inactive PGs. >> >> Cheers, >> Tyler >> >> >> On Sun, May 29, 2022, 5:30 PM Jeremy Hansen <[email protected] >> (mailto:[email protected])> wrote: >>> Is there a maintenance mode for Ceph that would allow me to do work on >>> underlying network equipment without causing Ceph to panic? In our test >>> lab, we don’t have redundant networking and when doing switch upgrades and >>> such, Ceph has a panic attack and we end up having to reboot Ceph nodes >>> anyway. Like an hdfs style readonly mode or something? >>> >>> Thanks! >>> >>> _______________________________________________ >>> ceph-users mailing list -- [email protected] (mailto:[email protected]) >>> To unsubscribe send an email to [email protected] >>> (mailto:[email protected]) > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
