On Mon, 23 Oct 2023, Karl Auerbach via Nnagain wrote:
It would be nice if we built our network devices so that they each had a
little introspective daemon that frequently asked "am I healthy, am I
still connected, are packets still moving through me?" (For consumer
devices an answer of "no" could trigger a full device reboot or reset.)
I agree with a lot of what you say, but I want to throw in a word of caution
here. I have seen systems go from 'slow but functioning' to 'completely down and
requires a complete datacenter shutdown to recover' because of automated
response systems that decided to restart something when it didn't respond fast
enough, triggering a cascade of failures that prevented any service from being
able to start into a healthy state.
I've also implemented monitoring on APs to restart them if they don't have a
path to the Internet, resulting in continual reboots when there is a transitory
issue (now changed to only check their next hop and only shut down wifi to avoid
becoming a black hole for that SSID
to err is human, to really mess things up requires a computer, and automation
removes the oversight from the computer allowing it to do more damage faster.
David Lang
_______________________________________________
Nnagain mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/nnagain