On 5/30/2025 6:29 AM, Chuck Zmudzinski wrote: > On 5/29/2025 4:02 PM, Greg A. Woods wrote: >> At Thu, 29 May 2025 15:01:50 -0400, Chuck Zmudzinski <frchu...@gmail.com> >> wrote: >> ... >> >>> When I pass bootdev=dk12 in boot.cfg, the bootloader strangely tries dk1 as >>> root >>> (which is wrong) and correctly detects dk11 as the dump device. But it never >>> gives me the chance to enter the correct root device and instead tries to >>> load >>> init which of course it cannot find the NetBSD init on dk1 because dk1 is >>> not >>> the correct NetBSD root device. In fact on this box a Linux distro is >>> installed >>> on dk1, as evidenced by the filesystem type detected on dk1: ext2fs. >> >> Ah, I think that's a bug related to some bizarre/old hacks to find the >> "booted_partion" for non-GPT disks: >> >> if (strncmp(xcp.xcp_bootdev, devname, strlen(devname))) >> continue; >> >> if (is_disk && strlen(xcp.xcp_bootdev) > strlen(devname)) { >> /* XXX check device_cfdata as in x86_autoconf.c? */ >> booted_partition = toupper( >> xcp.xcp_bootdev[strlen(devname)]) - 'A'; >> DPRINTF(("%s: booted_partition: %d\n", __func__, >> booted_partition)); >> } >> > ... > A very simple sanity check that might fix this case would be to reject > the match if the extra digit in the bootdev string is a numerical digit > instead of a letter of the alphabet between a and p, because only such > a letter would indicate that the two devices are related as a full disk > device and a device that is a partition on the full disk. > > Essentially, it looks like we are getting a false positive when searching > for the device on which the root partition resides, and I think maybe if > we add that extra sanity check of making sure the extra digit is a letter > between a and p instead of a numerical digit like 2 we would correctly > detect dk12 as the root device on my system instead of getting dk1 as a > false positive.
So, as long as there is no funny business with byte order and "endianness" with "booted_partition", I think the additional sanity check could be just a two-line addition between the statement when we compute "booted_partition" and the DPRINTF statement that logs the value of "booted_partition": if (booted_partition & 0xfffffff0) continue; This should work because we would expect the booted partition's value to be an unsigned integer less than 0xf. If any of the higher order 28 bits of "booted_partition" has a 1, then we know we do not have a valid partition index between 0 and 15 so we should bail out and try the next devname. Of course this also assumes "booted_partition" is declared as an uint32_t, or equivalent. I could not find its declaration in arch/xen/xen/xen_machdep.c and presumably it is declared in a header somewhere. Chuck Zmudzinski