On Tue, Nov 30, 2021 at 08:18:56PM +0100, Ben Hutchings wrote: > On Tue, 2021-11-30 at 11:01 +0100, Fabian Grünbichler wrote: > [...] > > possibly interesting in that context (I asked/posted the link in > > #debian-kernel a few days ago as well) - these BTF sections now actually > > reference the BTF info in the kernel image itself (as part of the > > deduplication of shared information), which makes the latter part of the > > ABI, and AFAICT this is not (yet?) tracked in Debian.. > > > > https://lore.kernel.org/all/1637926692.uyvrkty41j.astr...@nora.none/ > > > > an otherwise ABI compatible kernel upgrade thus has the potential to > > break module loading altogether, and I'd recommend disabling the split > > BTF feature for the time being unless you plan on bumping ABI for every > > kernel update anyway. > > Yes, that is interesting/concerning. > > If we continue to not bump the ABI number on every update, then I > think: > > 1. In-tree modules should not be loadable between an upgrade and a > reboot. (This can happen already for specific modules, due to symbol > version changes that we think don't affect out-of-tree modules.) > Alternatively, they could still be loadable but then their BTF info > should be completely discarded. > > 2. Out-of-tree modules should be built without BTF deduplication, or > without BTF info. > > The main reason for not bumping the ABI number every time is to avoid > forcing an unnecessary rebuild of out-of-tree modules. We could try > switching to something like RHEL's "weak-update" mechanism where ABI- > compatible out-of-tree modules are automatically linked into a new > version's modules directory without rebuilding them. In that case we > would still need to implement item (2) above.
FWIW, I ran into this issue for real on a Sid system: booted kernel: Linux host 5.15.0-2-amd64 #1 SMP Debian 5.15.5-1 (2021-11-26) x86_64 GNU/Linux installed kernel: ii linux-image-5.15.0-2-amd64 5.15.5-2 amd64 Linux 5.15 for 64-bit PCs (signed) attempting to (auto-)load any module not already loaded before the upgrade: Dec 26 17:18:48 host mtp-probe[319902]: checking bus 4, device 3: "/sys/devices/pci0000:00/0000:00:01.2/0000:02:00.0/0000:03:08.0/0000:05:00.3/usb4/4-4" Dec 26 17:18:48 host mtp-probe[319902]: bus: 4, device: 3 was not an MTP device Dec 26 17:18:49 host kernel: scsi 3:0:0:0: Direct-Access Multi-Reader -0 1.00 PQ: 0 ANSI: 6 Dec 26 17:18:49 host kernel: scsi 3:0:0:1: Direct-Access Multi-Reader -1 1.00 PQ: 0 ANSI: 6 Dec 26 17:18:49 host kernel: scsi 3:0:0:2: Direct-Access Multi-Reader -2 1.00 PQ: 0 ANSI: 6 Dec 26 17:18:49 host kernel: scsi 3:0:0:3: Direct-Access Multi-Reader -3 1.00 PQ: 0 ANSI: 6 Dec 26 17:18:49 host kernel: scsi 3:0:0:0: Attached scsi generic sg0 type 0 Dec 26 17:18:49 host kernel: scsi 3:0:0:1: Attached scsi generic sg1 type 0 Dec 26 17:18:49 host kernel: scsi 3:0:0:2: Attached scsi generic sg2 type 0 Dec 26 17:18:49 host kernel: scsi 3:0:0:3: Attached scsi generic sg3 type 0 Dec 26 17:18:49 host kernel: BPF:[86226] ENUM T_CONDITION_MET Dec 26 17:18:49 host kernel: BPF:size=4 vlen=11 Dec 26 17:18:49 host kernel: BPF: Dec 26 17:18:49 host kernel: BPF:Invalid name Dec 26 17:18:49 host kernel: BPF: Dec 26 17:18:49 host kernel: failed to validate module [sd_mod] BTF: -22 module loading fails until booted and on-disk kernel images match again - either by downgrading the latter (to 5.15.5-1 in this case), or by rebooting. note that just disabling the relevant KConfig doesn't work in my experience, since it will be automatically enabled again by the presence of a split-BTF capable pahole version in the build environment. patching the default value to 'n' does work though[0]. 0: https://git.proxmox.com/?p=pve-kernel.git;a=commitdiff;h=bc1d1913898940cabcea142f75a2a4759790a503