On 2016-10-18 05:25, li...@wrant.com wrote:
Mon, 17 Oct 2016 18:00:39 +0200 Karel Gardas <gard...@gmail.com>
1) use machine with proper ECC support


Hello Karel,

Please explain this "proper ECC support" for every laptop user out there?
[..]
Mon, 17 Oct 2016 21:48:47 +0800 Tinker <ti...@openmailbox.org>
Sometimes a machine goes unresponsive. In this case, a non-ECC RAM
machine.

Hello Tinker,

This is one very intriguing problem with a very trivial solution: reboot. The idea to work around missing ECC support with software is as practical
[..]

Hi Anton,

You misread me -

What I queried for was not how to trig some event logic on bit flip errors (because on a non-ECC machine those will generally appear as data corruption or undefined behavior only) or other hardware or kernel error, but:

How to trig some event logic when the system has become vegetable because of overload by the userland?


My limited experience here says that system overload caused by user processes can lead to that all processes die or freeze, and that the system goes otherwise unresponsive, except for that terminal input still is echoed.

And for that I speculated that such event logic could be implemented as some in-kernel code e.g. as a kernel thread, if those have some kind of higher execution guarantee than user process code,

E.g., when a userland watchdog/monitoring process didn't send any "I'm OK" signal to that thread for 60 seconds, that thread would dump the system's state to the console and reboot the machine.

This way I'd be able to distinguish userland-caused system crashes from hardware/kernel crashes, as the further always make that output and reboot, whereas the latter don't (but instead reboot, crash to kernel debug console, or just freeze the system altogether).

Do you see where I was heading now?

Tinker

Reply via email to