Occasionally after a power loss some computers, especially virtual machines for
obvious reasons, are no longer able to boot. The bootloader reads the kernel,
one of the two spins for a bit and then the computer returns to the bootloader
prompt. In the case of VMs, vmd eventually gives up and turns it off. After
getting into an affected OS, /bsd doesn't match the sha in /var.
The repair is easy thanks to the simple design - load bsd.booted into single
user mode and replace /bsd (and the sha in /var) with it then boot as normal,
taking care to let KARL finish this time. Obviously "don't keep losing power"
is another solution, as is "get rid of the cats who keep knocking over the
router", but neither is really workable and only hide the error anyway.
The problem stems from here I think:
newinstall:
umask 077 && cp bsd /nbsd && mv /nbsd /bsd && \
sha256 -h /var/db/kernel.SHA256 /bsd
I guess, and it is only a guess, that what's happening is that the filesystem
doesn't finish writing /nbsd's data even after it's been moved to /bsd, so on
my computers here I've put 'sync' between the cp and the mv to see if it
resolves it (it will go away of course when 6.6 is installed) and I'll try to
find the time to reproduce the problem reliably and hopefully produce a fix
that's not essentially shotgun-debugging.
In the meantime does that sound like a likely source of the problem, and if so
is there a better way to fix it?
Matthew