I couldn't agree with you more - we need to take care that any control or management systems we create are not turn-off-the-Internet switches in disguise.

My even larger fear is that as we increasingly cross-link our various forms of infrastructure that protective measures will put us into (hopefully transient) neo-stone age with a long, difficult recovery.

I have had a long interest that comes from comparing the relative robust response of living organisms to the rather brittle responses of our technologies.

Living things have an option that is not usually available to our technologies - death of the individual.

The one lesson I've been able to draw out so far is that living things often have layers of responsive mechanisms that arise because evolutionary processes typically do not erase old machinery, but, rather, add new responses.  If the new response proves inadequate then the old mechanisms are still there and might offer a useful solution to whatever condition has happened.

The corollary that I derived from that is that we ought to be designing our network systems with layers of response machinery, often working somewhat at cross purposes, and with the goal being survival rather than optimal use of resources.

How to do this in practice remains somewhat elusive, at least to me.

    --karl--

On 10/23/23 4:39 PM, David Lang wrote:
On Mon, 23 Oct 2023, Karl Auerbach via Nnagain wrote:

It would be nice if we built our network devices so that they each had a little introspective daemon that frequently asked "am I healthy, am I still connected, are packets still moving through me?"  (For consumer devices an answer of "no" could trigger a full device reboot or reset.)

I agree with a lot of what you say, but I want to throw in a word of caution here. I have seen systems go from 'slow but functioning' to 'completely down and requires a complete datacenter shutdown to recover' because of automated response systems that decided to restart something when it didn't respond fast enough, triggering a cascade of failures that prevented any service from being able to start into a healthy state.

I've also implemented monitoring on APs to restart them if they don't have a path to the Internet, resulting in continual reboots when there is a transitory issue (now changed to only check their next hop and only shut down wifi to avoid becoming a black hole for that SSID

to err is human, to really mess things up requires a computer, and automation removes the oversight from the computer allowing it to do more damage faster.

David Lang
_______________________________________________
Nnagain mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/nnagain

Reply via email to