I had the occasion to reboot one of my shiny new Xen servers today for the first time in a month and I found that it failed to boot because of the appearance since the previous successful boot of a new dk(4) attachment created for a GPT partition on another drive.
boot device: dk0
root on dk0
Supported file systems: union umap tmpfs smbfs puffs ptyfs procfs
overlay null ntfs nfs msdos mfs lfs kern cd9660
no file system for dk0 (dev 0xa800)
cannot mount root, error = 79
root device (default dk0):
The problem here is that the system boots from sd0 and root is on sd0a!!!
Worse yet, dk0 is not even on sd0, it's a wedge on sd1:
sd1 at scsibus1 target 1 lun 0: <DELL, PERC 6/i, 1.11> disk fixed
sd1: fabricating a geometry
sd1: 1861 GB, 1905664 cyl, 64 head, 32 sec, 512 bytes/sect x 3902799872
sectors
sd1: fabricating a geometry
sd1: GPT GUID: e171fce5-0937-49de-ab2a-399ac308a695
dk0 at sd1: percraid0
dk0: 3902795776 blocks at 2048, type:
The server is running a recent-ish NetBSD 7.99.5 XEN3_DOM0 kernel
(from Feb. 20), under Xen-4.5.
I used the following commands to put a GPT label on sd1 and make a wedge
there for the dk0 device that I then use for LVM:
dd if=/dev/zero of=/dev/rsd1d bs=8k count=1
gpt create sd1
gpt add -a 512k -l percraid0 sd1
dkctl sd1 makewedges
As far as I know this should not make the wedge appear bootable, and I
would not expect the kernel to treat this wedge as special in any way --
i.e. especially not to override the boot device specified by the loader.
# dkctl sd1 listwedges
/dev/rsd1d: 1 wedge:
dk0: percraid0, 3902795776 blocks at 2048, type:
Note the wedge "type" is blank. The manual doesn't seem to list a wedge
type that would be valid for LVM use, though maybe ccd or swap or unused
would suffice, but except for this boot problem it works with no type.
I didn't do anything special to not select a type -- just the "makewedges".
I'm able to work around this with a "bootdev=sd0" in /boot.cfg, but that
doesn't seem like the right way, and I don't think it should be necessary.
Google searches suggest I'm not the only person who has been tripped up
by this issue.
Am I missing something here that I could do to change the wedge
configuration to avoid this issue? Is it still so difficult to discover
which device the boot loader booted the kernel from on such a
semi-modern amd64 machine that the kernel can make such mistakes as
this? If dk(4) is auto-configuring can it not at least look to see if
there's a valid filesystem on the device before it shoves itself in the
front of the line as the supposed "boot device"? Should there be a
wedge "type" for LVM?
--
Greg A. Woods
Planix, Inc.
<[email protected]> +1 250 762-7675 http://www.planix.com/
n
pgp9YSIS22_L6.pgp
Description: PGP signature
