Occasionally after a power loss some computers, especially virtual machines for 
obvious reasons, are no longer able to boot. The bootloader reads the kernel, 
one of the two spins for a bit and then the computer returns to the bootloader 
prompt. In the case of VMs, vmd eventually gives up and turns it off. After 
getting into an affected OS, /bsd doesn't match the sha in /var.

The repair is easy thanks to the simple design - load bsd.booted into single 
user mode and replace /bsd (and the sha in /var) with it then boot as normal, 
taking care to let KARL finish this time. Obviously "don't keep losing power" 
is another solution, as is "get rid of the cats who keep knocking over the 
router", but neither is really workable and only hide the error anyway.

The problem stems from here I think:

newinstall:
        umask 077 && cp bsd /nbsd && mv /nbsd /bsd && \
            sha256 -h /var/db/kernel.SHA256 /bsd

I guess, and it is only a guess, that what's happening is that the filesystem 
doesn't finish writing /nbsd's data even after it's been moved to /bsd, so on 
my computers here I've put 'sync' between the cp and the mv to see if it 
resolves it (it will go away of course when 6.6 is installed) and I'll try to 
find the time to reproduce the problem reliably and hopefully produce a fix 
that's not essentially shotgun-debugging.

In the meantime does that sound like a likely source of the problem, and if so 
is there a better way to fix it?

Matthew

Reply via email to