OK, I set up some processes to fork makes under several users, and chose local
(/home/username) makes of a software package (nmap), all starting based on the
existence of a trigger file in / and waiting a random number of seconds
between 0 and 20 before beginning the make loop.
And then I waited a random number of seconds between 40 and 120 before pressing
the reset switch.
I did this 100 times
I timed the recovery (What an interesting way to spend an uneventful night,
right up there with watching trees grow.)
Average recovery time from filesystem message to "PASSED" 8.7 seconds
extreme range 3.3 to 11.7 seconds
number of corrupted files: ZERO
number of requests for manual fixes: ZERO
I think it might take a better strategy than mine to break it by a random
reset. I am running it under heavy load now, and no whimpers yet.
Civileme