Thanks for the info John.

Here's some more background into our setup.  It's a semi-embedded system (ie, 
no "system admin" - all admin has to be handled by our programs/scripts) that 
is installed on hundreds/thousands of devices around the world (so manually 
doing things is possible but very expensive).

For the most part, we're using stock Debian 8 (Jessie).  There's a slight 
modification to the grub scripts in /etc/grub.d.   The change is I added 
sourcing my own script just after it sources the "grub-mkconfig_lib" library.
The contents of the new script is:
make_system_path_relative_to_its_root ()
{
    "${grub_mkrelpath}" "$1" | sed -e 's,/@[^/]*,/@,'
}

I wrote this quite a few years ago, so I don't remember *how* it works, but I 
remember it was required for a reason that I'll get to shortly.

Since we're using btrfs for the filesystem, we have it setup where the 
linux-root (I'll use this to refer to what linux mounts at /) is actually a 
subvolume.
The root of the btrfs filesystem has these things in it:
@                A symlink to the subvolume we want to boot into
boot            A symlink to "/@/boot"
@deb8       One of our linux-root subvolumes
@deb10     Another of our linux-root subvolumes (Really, the linux-root 
subvolumes could be named anything.)

In all the linux-root's, /etc/fstab has this:   UUID=...  /  btrfs   
defaults,subvol=@        0 1
The patch to grub that I mentioned at the top of the email is to prevent grub 
from "resolving" the @ symlink to the real subvolme path when generating the 
/boot/grub/grub.cfg file.
So our generated grub.cfg would end up with entries like this:  linux   
/@/boot/vmlinuz-3.16.0-9-amd64 root=UUID=06307c88-ee37-4a15-ada7-83bf6d8c2955 
ro single rootflags=subvol=@
(Without my patch, the path would be /@deb10/boot/vmlinuz-... instead.)

Now here's where the problem starts.  In the @deb8 subvolume, linux-3.16... is 
installed and in the @deb10 subvolume, linux-4.19... is installed.  The 
/@deb8/boot/grub/grub.cfg and /@deb10/boot/grub/grub.cfg files are both correct.
However, if grub-install is run while booted into the @deb10 subvolume, if we 
"switch" back to @deb8 (by changing that @ symlink and rebooting) when grub 
starts up it uses the config from /@deb10/boot/grub/grub.cfg instead of the one 
we'd expect (from /@deb8/boot/grub/grub.cfg<mailto:/@deb8/boot/grub/grub.cfg>).

I'm not at all up to speed how grub actually executes at boot, but I'm aware 
there are executable bits "outside" of the filesystem (in the MBR or 
somewhere?).   My theory is those pieces hold an inode number of 
/boot/grub/grub.cfg when grub-install is ran (so that it doesn't need to have 
lookup code to find the data given the filepath "/boot/grub/grub.cfg").   Or 
maybe some other kind of config caching?
We've found that if we run "update-grub" and "grub-install /dev/sda" whenever 
we "switch" snapshots, the problem doesn't occur.

Reply via email to