Hi Doc, On Thu, 29 Jan 2026 at 23:54, D. R. Evans <[email protected]> wrote: > David wrote on 1/29/26 3:55 PM:
> > Can you please report the result of these interactive grub-rescue > > commands, after booting your system with only the drive being rescued > > connnected: > > ls (hd0)/ > > ls (hd0, msdos1)/ > > ls (hd0, msdos2)/ > > ls (md/0)/ > > > They all return: > error: unknown filesystem Ok, I take that to mean that the grub "second portion" that currently exists on your drive is incapable of reading any filesystem that it is aware of on that drive. This is a strange situation, not sure how that has arisen, comments on that below, but at least concrete facts are progress. For context, I'm going to quote some prior info you provided, and I have interleaved some comments. On Thu, 22 Jan 2026 at 17:11, D. R. Evans <[email protected]> wrote: [a] > I had a linux-raid two-drive system that was working fine for many years. > The system uses legacy BIOS booting. My notes from long ago say that both > drives had a working GRUB; but it seems that my notes were wrong: one of > the drives died without warning, leaving me with a drive with > a fully-functioning trixie (and all the user data, etc.) present, but > that drive seems to have no working GRUB in the MBR. Trying to boot it > gives me a "grub-rescue>" prompt. [...] > I can physically remove the drive and place it on a functioning machine, > and have done so. With the drive in the functioning machine, I have > checked that indeed all the data on it (that were in the original "/" > hierarchy) are readable. So I just want to find a way to install GRUB on > the MBR in a way that will cause the disk to be bootable into the system > that was on it. That is, I want to be able to remove the disk from the > functioning machine that it's currently (temporarily) on, put the drive > back in the original machine, power on, and have the system come up as it > used to (except now with just one active drive in the RAID array). > > From there I can add a new drive to the array and get myself back > a fully-functioning two-drive RAID-based system. Before we get lost in all the details again, can we do a sanity check: Do you really need to take this approach? You say that the old disk is readable when put into another functioning machine. So another approach could be to build a brand new RAID system and just recover whatever old data you need off the old disk. Do you really need to get the old disk to boot? Note that if you make a mistake continuing with attempts to get the old disk to boot, you might corrupt it and lose the ability to read any old data off it. On Thu, 22 Jan 2026 at 21:40, D. R. Evans <[email protected]> wrote: [b] > alain williams wrote on 1/22/26 11:01 AM: > > You do not say what sort of RAID you are using, but you have 2 disks so > > I assume RAID-1 (mirrored disks). > > Yes, you are correct: RAID-1. [...] > The command "parted -l" gives: > > Model: ATA HGST HDN724030AL (scsi) > Disk /dev/sda: 3001GB > Sector size (logical/physical): 512B/4096B > Partition Table: msdos > Disk Flags: > > Number Start End Size Type File system Flags > 1 1049kB 16.0GB 16.0GB primary boot, raid > 2 16.0GB 2000GB 1984GB primary raid > > All the system info is in that second partition. I don't rightly recall > why the first partition is present (it's been an awfully long time since > I installed this disk). I suspect that it's reserved for swap, although > I doubt that swapping has ever occurred. [...] > I had a spare hard drive on which I installed a pristine copy of trixie > (on ext4), and that's what I've booted from. [...] > > Identify the hard disk that contains your system, look at > > /proc/partitions. > major minor #blocks name > > 2 0 4 fd0 > 8 16 1953514584 sdb > 8 17 1945125888 sdb1 > 8 18 1 sdb2 > 8 21 8385591 sdb5 > 8 0 2930266584 sda > 8 1 15624192 sda1 > 8 2 1937889280 sda2 > 11 0 1048575 sr0 > 11 1 1048575 sr1 > 9 126 1937757184 md126 > > sdb (size 2TB) is the drive from which I booted. sda (size 3TB) is the > old RAID drive. On Fri, 23 Jan 2026 at 15:30, D. R. Evans <[email protected]> wrote: [c] > Based on the above output from /proc/partitions, I think that the root > filesystem is best described as /dev/md126. > > > mkdir /tmp/RFS > > OK > > > mount /dev/sda1 /tmp/RFS For just inspecting a disk I'd suggest 'mount -r' for a readonly mount, to avoid accidents. > I changed this to: mount /dev/md126 /tmp/RFS > > I note that if I try to execute "mount /dev/sda2 /tmp/RFS" instead, then > I get an error about "linux_raid_member" being an unknown filesystem > type. But if I call that partition "md126", then the mount proceeds OK. Ok, so we know that /dev/sda2 is "linux_raid_member", and not readable as an ext2/3/4 device. I am curious to know what is on /dev/sda1. You seem unclear about that, maybe you could investigate. I notice above it has a boot flag. What happens when you try to mount /dev/sda1? I suggest a similar 'mount -r' inspection as you show below for /dev/sda2. What does 'lsblk -f' command say about disk sda and its partitions and their FSTYPE? > At this point, if I look at the contents of /tmp/RFS, I see (wrapped by > the e-mail agent): [...] > and /tmp/RFS/boot/grub: > > total 2428 > drwxr-xr-x 5 root root 4096 Jan 10 08:32 . > drwxr-xr-x 3 root root 12288 Jan 10 08:34 .. > drwxr-xr-x 2 root root 4096 Feb 28 2020 fonts > -r--r--r-- 1 root root 16758 Jan 10 08:32 grub.cfg > -rw-r--r-- 1 root root 1024 Jan 20 11:44 grubenv > drwxr-xr-x 2 root root 24576 Sep 8 15:25 i386-pc > drwxr-xr-x 2 root root 4096 Sep 8 15:25 locale > -rw-r--r-- 1 root root 2411806 Sep 8 15:25 unicode.pf2 For your info, the "third portion" [1] of grub is the module files contained in the i386-pc directory you see above. It would be interesting to know what grub version is in that directory, you could try something like the example below that I see on this machine: [root@kablamm i386-pc]# strings normal.mod | grep -C 1 version not in normal environment GNU GRUB version %s 2.06-13+deb12u1 On Sun, 25 Jan 2026 at 20:47, D. R. Evans <[email protected]> wrote: [d] > I found my notes from when I first installed Linux software RAID on the > machine in question (in 2013), and, without going into details, it > basically was simply done as a RAID1 installation from what was at the > time the official debian installer. I had to do it a couple of times to > get the partitioning on the two drives right. Both of the drives in that > original installation were 2TB. My memory about the partitioning was > correct: the two original drives each had two partitions: i) 16GB, which > was used as swap on one drive, and merely formatted as ext3, but unused, > on the other; then the rest of each drive was the RAID mirror. > > A few years later, one of the drives started to generate SMART errors, so > it was replaced. Although both the original drives were 2TB, I had to use > a 3TB drive as a replacement, because when I tried to use a replacement > 2TB drive, something complained in the process that there was > insufficient space -- presumably because of disk-to-disk variation in > usable space. That replacement 3TB drive is the one that I am now trying > to make boot. My notes don't say that I ever tried to make that drive > bootable, which was an obvious error on my part, and why I am now in this > situation. (I probably knew it had to be done, and didn't have time, and > then forgot all about it.) This replacement 3TB drive, was it new and unused, or had it been used previously? Because you initially showed: On Sat, 24 Jan 2026 at 20:42, D. R. Evans <[email protected]> wrote: [e] > If I try to boot from the RAID disk, which drops me into grub rescue, > then type "ls", it responds with: > (hd0) (fd0) > > If I type "set", it responds with: > prefix='(hd0)/BOOT/debain@/grub' > root='hd0' so I wonder at what point those "second portion" values were created, where they came from, and if they are relevant to your RAID installation in any way (I doubt it), or maybe a relic from some unrelated prior use of that disk (I suspect so). Unfortunately you never showed us any attempt to use 'ls' in grub-rescue to inspect the filesystems on the drive in that state, so we don't know if that grub could read them or not. And then you showed for two drives installed: On Sun, 25 Jan 2026 at 18:01, D. R. Evans <[email protected]> wrote: [f] > Having gone through the procedure in the other sub-thread, and thereby > installing a new MBR, what I now see at the "grub rescue>" prompt is > quite different (I have to take a photo of the screen, upload it to this > computer, and type what it says into this e-mail; I think I have done > that without any mistakes) [recall from that sub-thread that when the > RAID MBR was written there were two drives in the machine: /dev/sda was > the non-booting RAID disk, and /dev/sdb was the non-RAID disk that had > supplied the running OS]: > > "ls" now returns: > > (hd0) (hd0,msdos2) (hd0,msdos1) (hd1) (hd1,msdos5) (hd1,msdos1) (md/0) (fd0) > error: failure reading sector 0xb30 from 'fd0'. > > I interpret that to mean, in order: > the entire RAID disk (/dev/sda) > the second partition on the RAID disk (/dev/sda2) > the first partition on the RAID disk (/dev/sda1) > the entire non-RAID disk (/dev/sdb) > the fifth partition on the non-RAID disk (/dev/sdb5) > the first partition on the non-RAID disk (/dev/sdb1) > the logical RAID disk > the floppy drive > and there was an error accessing the floppy, probably because there was > nothing in the drive. > > "set" now returns ("8d86...0aed" is shorthand for a long UUID): > > prefix='(mduuid/8d86...0aed)/boot/grub' > root = 'mduuid/8d86...0aed' And now you're showing for one drive installed: On Thu, 29 Jan 2026 at 18:27, D. R. Evans <[email protected]> wrote: [g] > Following several attempts to use a separate drive/CD/USB stick to get > things working (so far without success), the current situation if I take > everything else out, and just leave the RAID drive in the machine, and > boot to the "grub rescue>" prompt, I now see (I have to type everything > manually into this e-mail; I'll obviously try to be really careful about > that): > > ls: > > (hd0) (hd0, msdos2) (hd0, msdos1) (md/0) (fd0) error: failure reading sector > 0xb30 from 'fd0' > > The fd error is, I'm sure, irrelevant. > > The actual RAID filesystem from which we want to boot is the second > partition on the hard drive: (hd0, msdos2). > > set: [in this output I'll use "<UUID>" to mean a UUID that it prints, > which starts with "8d86" and ends with "0aed"] > > prefix='(mduuid/<UUID>)/boot/grub' root='mduuid/<UUID>' So by contrast to previous [e], it seems that [f] and [g] show that you overwrote (I assume using grub-install command) the previous "second portion" with a new one which is more capable to read the partition table and find partitions (hd0,msdos1) (hd0,msdos2) which were not shown previously. However I suspect that this new "second portion" is somewhat useless because whatever 'grub-install' command you used was not nuanced enough to deal with the situation that grub would get confused between the two drives and the various partitions RAID or otherwise that were present at the time. The "second portion" now sees two additional partitions that it didn't see before, and an additional (md/0) device, but cannot read any of their contents. If you want to take the risk (eg possibly losing your data further, please take a backup somehow first) then you could try using a more careful 'grub-install' command, but you will need to do this with the two disks back in the machine as you showed at [f]. Because this method from [h] won't work: On Thu, 29 Jan 2026 at 11:34, Alexander V. Makartsev <[email protected]> wrote: [h] > I imagine you could type at grub rescue prompt something like this: > grub> set pager=1 > grub> insmod normal > grub> insmod part_msdos > grub> insmod diskfilter > grub> insmod mdraid1x > grub> insmod ext2 > grub> ls > (proc) (hd0) (hd0,msdos1) (hd0,msdos2) (md/0) > grub> set prefix=(md/0)/boot/grub > grub> set root=(md/0) > grub> linux ($root)/boot/vmlinuz-6.12.63+deb13-amd64 root=/dev/md126 > grub> initrd ($root)/boot/initrd.img-6.12.63+deb13-amd64 > grub> boot It won't work because 'grub>' prompt is not the same as 'grub rescue>' prompt. grub-rescue will have appeared *because* grub cannot find any "third portion" (including modules normal, part_msdos, diskfilter, mdraid1x, ext2, and others) so trying to 'insmod' them will fail, as David Wright explained: On Fri, 30 Jan 2026 at 01:00, David Wright <[email protected]> wrote: [i] > That's worrying. A Grub rescue prompt could well fail because it needs > modules to read the filesystems, but needs to read the filesystems to > find its modules, which is why it needs a human to help it search > every nook and cranny of all the devices for some modules to load. So if: - you have the two disks back in the machine, and - you have booted it and confirm that they appear exactly as you have shown at [f], and - you have backups and are prepared to risk data loss, then - you could try something like this, using ideas from [j]: grub-install --modules='part_msdos diskfilter ext2 mdraid1x' /dev/sda We cannot give a --boot-directory argument value to that command at this time (which would create a 'prefix' value) because we do not know what value to specify, and its grub device might be wrong. Then remove the good drive and try to boot again using only the faulty drive. It will fail and drop you into grub-rescue, because we have not provided a correct 'prefix' value yet. But there is a chance that retrying the procedure I described in [k] (at "What I would do is ...") will be more successful this time, and maybe one of the filesystems will be readable by grub-rescue, allowing you to then try the procedure I described in [m]. If that succeeds it will allow you to boot one time only. You will need to run a different grub-install command then, to permanently add the necessary values of 'prefix' and 'root' that you discovered into your "second-portion". After that you would need to recover your RAID-1 which I know nothing about beyond what I read at [n] which says: Since support for MD is found in the kernel, there is an issue with using it before the kernel is running. Specifically it will not be present if the boot loader is either (e)LiLo or GRUB legacy. Although normally present, it may not be present for GRUB 2. In order to circumvent this problem a /boot filesystem must be used either without md support, or else with RAID1. In the latter case the system will boot by treating the RAID1 device as a normal filesystem, and once the system is running it can be remounted as md and the second disk added to it. This will result in a catch-up, but /boot filesystems are usually small. Footnotes: [1]: I will persist with this "Nth portion of grub" phrasing that we introduced previously, because although it is not standard, it reduces confusion with some outdated info about grub on the web which has terms like "Stage N". My "second portion" is the same as "core image" aka "core.img" which is usually outside any file system. There is a good diagram of GRUB2 MBR boot at https://en.wikipedia.org/wiki/BIOS_boot_partition [a]: https://lists.debian.org/debian-user/2026/01/msg00377.html [a]: [email protected] [b]: https://lists.debian.org/debian-user/2026/01/msg00392.html [b]: [email protected] [c]: https://lists.debian.org/debian-user/2026/01/msg00420.html [c]: [email protected] [d]: https://lists.debian.org/debian-user/2026/01/msg00484.html [d]: [email protected] [e]: https://lists.debian.org/debian-user/2026/01/msg00449.html [e]: [email protected] [f]: https://lists.debian.org/debian-user/2026/01/msg00475.html [f]: [email protected] [g]: https://lists.debian.org/debian-user/2026/01/msg00557.html [g]: [email protected] [h]: https://lists.debian.org/debian-user/2026/01/msg00551.html [h]: [email protected] [i]: https://lists.debian.org/debian-user/2026/01/msg00578.html [i]: [email protected] [j]: https://unix.stackexchange.com/questions/17481/grub2-raid-boot [k]: https://lists.debian.org/debian-user/2026/01/msg00574.html [k]: [email protected] [m]: https://lists.debian.org/debian-user/2026/01/msg00468.html [m]: CAMPXz=qk7dn4jjcrdkelajha2bgi8_dg2gvhow-wo0gm3iz...@mail.gmail.com [n]: https://en.wikipedia.org/wiki/Mdadm

