For best results I use a out of band network device to cut power to devices and reboot them when they fail the watchdog criteria.
Normally they stop pinging or a service isn't responding after a NAGIOS plugin attempt to restart. I would have a look at webpowerswitch.com I use this with PCS and a GFS2 cluster for enforcing and recovering fencing. Works well. On Sat, Jan 7, 2023 at 3:12 PM Pierre-Francois Renard <pfren...@gmail.com> wrote: > Hello guys, > > > I am running 6 RPI4s with fedora 37. K3S is powering this cluster and it > is working well :) > > But from time to time, 1 RPI is randomly hanging. > > I am thinking about implementing a watchdog : > > - software based, using embeded linux kernel > > - hardware based such as https://www.omzlo.com/articles/the-piwatcher > > > Do you have any experience on one of theses two solutions ? Do you have > alternatives ? > > > By the way your job is fantastic and it is a great pleasure to be able > to run F37 on aarch64 so easily ! > > > Thanks a lot > > _______________________________________________ > arm mailing list -- arm@lists.fedoraproject.org > To unsubscribe send an email to arm-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org > Do not reply to spam, report it: > https://pagure.io/fedora-infrastructure/new_issue >
_______________________________________________ arm mailing list -- arm@lists.fedoraproject.org To unsubscribe send an email to arm-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue