Re: Recovering from ERR M at boot on a collocated server with a home made "rescue" partition available.

Nick Holland Wed, 12 Aug 2009 19:38:38 -0700

Great details, but I'm snipping them for length.

btw: you forgot the part about how horrible we are for letting this happen
to you, so I can't ignore your polite plea for help.  dang. :)

artligh...@free.fr wrote:
...
> fdisk wd0
> Disk: wd0       geometry: 19457/255/63 [312581808 Sectors]
> Offset: 0       Signature: 0xAA55
>              Starting         Ending        LBA Info:
>  #: id      C   H  S -      C   H  S [      start:        size ]
> ------------------------------------------------------------------------
>  0: 10      0   1  1 -    100 254 63 [         63:     1622502 ] OPUS
> *1: A6    101   0  1 -  19456 254 63 [    1622565:   310954140 ] OpenBSD
>  2: 00      0   0  0 -      0   0  0 [          0:           0 ] unused
>  3: 00      0   0  0 -      0   0  0 [          0:           0 ] unused
> 
> The first Opus(10) partition is in fact an openbsd install. I switch 
> between the two by making the slice 0 active putting its os code at 
> openBSD(A6) and putting the slice 2 at OS code Opus(10). I was probably 
> too smart for my own good at the time. :) 

yeah, I was too smart that way once, too.  I suspect I know what you did.

The problem with this "recovery partition" idea is "how do you switch
between partitions?".  The impulse is to call up fdisk, and change the
partition types and flag the other one as active.

This is a good way to end up with a non-bootable system.  I'm not
entirely sure why, I'll make a complete fool of myself and speculate,
but it seems the OS notices you changed the default boot partition
and starts writing stuff to the new 'a' partition (or ?) based on
what it knew of the old one...  I suspect that's completely wrong,
but it provides a good mental model of why you shouldn't do that.

(IF you want to multiboot that way, do your fdisk changes with bsd.rd.
Or just get a second computer.  It's not worth the headaches!)

> ===============================================================
> A KWM showed a failure at boot time of the "ERR M " kind ...
> ===============================================================
>  I looked it up and understand that it is linked to boot(8) being 
> corrupt and not knowing were to look for the kernel. 

yeah, whatever the PBR grabbed, it didn't look like /boot.  What the
PBR is supposed to grab is hard-coded in the PBR by installboot.
SO, if something overwrites /boot or damages it or the PBR is
pointing into something odd, you get the ERR M
(there is very little space available in the PBR, which is why you
get five character error codes...)

>Installboot should
> be a solution but I don't know how to run it in my situation. 

following a really good fsck'ing, maybe...
 ...
> Question 1
> 
> Is it possible to do that ? i.e. reinstall the boot(8) bloc on slice 1 
> refering to the slice 1 kernel on that partition ? 

not easily.. (i.e., I can't think of how...and I can think of why
it might not be possible with existing code).

> To me it means being able to mount the slice 1 frome slice 0 no ? 
> 
> witch lead the to my second question 
> 
> Question 2 
> 
> Either to recue the slice 1 setup or to retrieve my datas, how can I 
> make the slice 1 seen and mount it from slice 0 ? I have record of the 
> exact layout of both slices. Currently disklabel from slice 0 show me 
> that : 
> 
> 
> 
> # disklabel wd0
> # Inside MBR partition 0: type A6 start 63 size 1622502
> # /dev/rwd0c:
> type: ESDI
> disk: ESDI/IDE disk
> label: ST3160812AS
> flags:
> bytes/sector: 512
> sectors/track: 63
> tracks/cylinder: 255
> sectors/cylinder: 16065
> cylinders: 19457
> total sectors: 312581808
> rpm: 3600
> interleave: 1
> trackskew: 0
> cylinderskew: 0
> headswitch: 0           # microseconds
> track-to-track seek: 0  # microseconds
> drivedata: 0
> 
> 16 partitions:
> #                size           offset  fstype [fsize bsize  cpg]
>   a:          1333332               63  4.2BSD   2048 16384    1
>   b:           289170          1333395    swap
>   c:        312581808                0  unused      0     0
>   i:        310954140          1622565 unknown
> 
> Obviously the i partition is the whole slice 1.
> 
> I want to make it so
> 16 partitions:
> #                size           offset  fstype [fsize bsize  cpg]
>   a:          4194307          1622565  4.2BSD   2048 16384    1
>   b:          4194304          5816872    swap
>   c:        312581808                0  unused      0     0
>   d:          4194304         10011176  4.2BSD   2048 16384    1
>   e:         62914560         14205480  4.2BSD   2048 16384    1
>   g:         62914560         77120040  4.2BSD   2048 16384    1
>   h:        125829120        140034600  4.2BSD   2048 16384    1
> 
> Since I recorded> the layout of the slice 1 when configuring the machine.

go pat yourself on the back.  you probably saved your own butt. :)

> Is it the right thing to do from disklabel from the slice 0 openBSD ? 
> 
> Would there be some adverse effect from doing it on my slice 1 datas ? 
> 
> Am I right in my understanding that is will only change my slice 0 
> disklabel, giving it knowledge of the slice 1 layout but not write 
> anything on the slice 1 ? 

What I'd do with this would be first: extend the fdisk partition that
is currently active (0, I do believe) to cover the entire disk.
I do believe you can make it overlap your existing partition 1 without
issue, in case you want to revert.

Now, using disklabel make NEW disklabel partitions that PRECISELY
match your fdisk partition 1 chunks, but with different,
non-conflicting letters.

Practice the maneuvers and manipulations on a local machine. :)

If you do this carefully, you should be able to rebuild your
system so it boots off your "recovery" a as root, and mount all
your production partitions where they were originally.

> Question 3 
> 
> Anybody would have a bright idea of a solution or things to do that 
> would have escaped my limited mind ? I am open to any pointer or 
> suggestion before doing some irreparable harm to my disks. 

yeah, I think I got one for you... (or two..or three!)
Sounds like you have console (or at least a human) on the machine,
so I would start by booting bsd.rd off your fdisk 0 partition,
then using fdisk to re-arrange the partitions to your "production"
configuration.  Now, fsck all your production partitions, then
mount them to see how they look.  IF you 'a' partition is in good
shape, copy over a new /boot (probably /mnt/boot), and then run
installboot.

Since you are asking so nicely and probably under more than a little
stress, I'll even give you guidance on installboot that people
usually screw up:

Assuming you mount your /dev/wd0a on /mnt, you will probably want
to do something similar to:

    # cd /usr/mdec    (that's on the bsd.rd)
    # cp boot /mnt
    # installboot -v /mnt/boot biosboot wd0

(there's probably something I typed wrong there, understand what
I am about to say, you can fix whatever I told you wrong. :)

The gotcha here is you want to have installboot install *the* /boot
that *will* load your OS, not the /boot that did load the currently
running kernel...that one currently is on /mnt/boot, even though when
you boot from it, it will be /boot.  People regularly do that part
wrong, and set up markers in their PBR pointing to the /boot on their
ramdisk kernel (or floppy or wrong HD or ...)
biosboot is the PBR boot code, that can be pulled out of the current
directory, it will be installed in the appropriate place on the
indicated HD.

You should actually be able to walk a person through this process
over the phone if you can't get console directly.

ANOTHER solution: load up a USB flash drive, mail it to 'em, and
have them boot the machine up from that, then you can poke around
and fix the thing up.

boy, the more tired I get the more good ideas I have...if you can
do this, it wins the prize for simplicity and ease:  switch the
partitions back around for your production config, have them
make an OpenBSD boot floppy or CD, have them start to boot from it,
when they get to the boot> prompt, have them enter:
    boot hd0a:/bsd
If the only problem is your boot code, you just got around it by
loading boot code off CD or floppy.  You can now run installboot
from the running system.

Key thing, though: if you have to do anything fancy, practice
locally first.

Good luck...

Nick.

Re: Recovering from ERR M at boot on a collocated server with a home made "rescue" partition available.

Reply via email to