On Thu, Mar 26, 2026 at 09:39:07PM +0000, David Matlack wrote: > On 2026-03-25 06:12 PM, Bjorn Helgaas wrote: > > On Mon, Mar 23, 2026 at 11:57:54PM +0000, David Matlack wrote: > > > Add an API to enable the PCI subsystem to participate in a Live Update > > > and track all devices that are being preserved by drivers. Since this > > > support is still under development, hide it behind a new Kconfig > > > PCI_LIVEUPDATE that is marked experimental. > ...
> > This sets "dev->liveupdate_incoming = false", and the only place we > > check that is in pci_liveupdate_retrieve(). In particular, there's > > nothing in the driver bind/unbind paths that seems related. I guess > > pci_liveupdate_finish() just means the driver can't call > > pci_liveupdate_retrieve() any more? > > liveupdate_incoming is used by VFIO in patch 10: > > https://lore.kernel.org/kvm/[email protected]/ > > Fundamentally, I think drivers will need to know that the device they > are dealing with was preserved across the Live Update so they can react > accordingly and this is how they know. This feels like an appropriate > responsibility to delegate to the PCI core since it can be common across > all PCI devices, rather than requiring drivers to store their own state > about which devices were preserved. I suspect PCI core will also use > liveupdate_incoming in the future (e.g. to avoid assigning new BARs) as > we implement more of the device preservation. Yes. It's easier to review if this is added at the point where it is used. > And in case you are also wondering about liveupdate_outgoing, I forsee > that being used for things like skipping disabling bus mastering in > pci_device_shutdown(). > > I think it would be a good idea to try to split this patch up, so there > is more breathing room to explain this context in the commit messages. Sounds good. > > > + * Don't both accounting for VFs that could be created after this > > > + * since preserving VFs is not supported yet. Also don't account > > > + * for devices that could be hot-plugged after this since preserving > > > + * hot-plugged devices across Live Update is not yet an expected > > > + * use-case. > > > > s/Don't both accounting/Don't bother accounting/ ? not sure of intent > > "Don't bother" was the intent. > > > I suspect the important thing here is that this allocates space for > > preserving X devices, and each subsequent pci_liveupdate_preserve() > > call from a driver uses up one of those slots. > > > > My guess is this is just an allocation issue and from that point of > > view there's no actual problem with enabling VFs or hot-adding devices > > after this point; it's just that pci_liveupdate_preserve() will fail > > after X calls. > > Yes that is correct. Mentioning VFs in the comment is a slight misdirection when the actual concern is just about the number of devices. > I see that a lot of your comments are about these WARN_ONs so do you > have any general guidance on how I should be handling them? If it's practical to arrange it so we dereference a NULL pointer or similar, that's my preference because it doesn't take extra code and it's impossible to ignore. Sometimes people add "if (!ptr) return -EINVAL;" to internal functions where "ptr" should never be NULL. IMO cases like that should just use assume "ptr" is valid and use it. Likely not a practical strategy in your case. > > > + if (pci_WARN_ONCE(dev, !dev_ser, "Cannot find preserved device!")) > > > > Seems like an every-time sort of message if this indicates a driver bug? > > > > It's enough of a hassle to convince myself that pci_WARN_ONCE() > > returns the value that caused the warning that I would prefer: > > > > if (!dev_ser) { > > pci_warn(...) or pci_WARN_ONCE(...) > > return; > > } > > For "this should really never happen" warnings, which is the case here, > my preference is to use WARN_ON_ONCE() since you only need to see it > happen once to know there is a bug somewhere, and logging every time can > lead to overwhelmingly interleaved logs if it happens too many times. I'm objecting more to using the return value of pci_WARN_ONCE() than the warning itself. It's not really obvious what WARN_ONCE() should return and kind of a hassle to figure it out, so I think it's clearer in this case to test dev_ser directly. > > > + for (i = ser->nr_devices; i > 0; i--) { > > > + struct pci_dev_ser *prev = &ser->devices[i - 1]; > > > + int cmp = pci_dev_ser_cmp(&new, prev); > > > + > > > + /* > > > + * This should never happen unless there is a kernel bug or > > > + * corruption that causes the state in struct pci_ser to get out > > > + * of sync with struct pci_dev. > > > > Huh. Same comment as above. I don't think this is telling me > > anything useful. I guess what happened is we're trying to preserve X > > and X is already in "ser", but we should have returned -EBUSY above > > for that case. If we're just saying memory corruption could cause > > bugs, I think that's pointless. > > > > Actually I'm not even sure we should check for this. > > > > > + */ > > > + if (WARN_ON_ONCE(!cmp)) > > > + return -EBUSY; > > This is another "this should really never happen" check. I could just > return without warning but this is a sign that something is very wrong > somewhere in the kernel and it is trivial to just add WARN_ON_ONCE() so > that it gets flagged in dmesg. In my experience that can be very helpful > to track down logic bugs during developemt and rare race conditions at > scale in production environments. OK. Maybe just remove the comment. It's self-evident that WARN_ON_ONCE() is a "shouldn't happen" situation, and I don't think the comment contains useful information. > > > +u32 pci_liveupdate_incoming_nr_devices(void) > > > +{ > > > + struct pci_ser *ser; > > > + > > > + if (pci_liveupdate_flb_get_incoming(&ser)) > > > + return 0; > > > > Seems slightly overcomplicated to return various error codes from > > pci_liveupdate_flb_get_incoming(), only to throw them away here and > > special-case the "return 0". I think you *could* set > > "ser->nr_devices" to zero at entry to > > pci_liveupdate_flb_get_incoming() and make this just: > > > > pci_liveupdate_flb_get_incoming(&ser); > > return ser->nr_devices; > > pci_liveupdate_flb_get_incoming() fetches the preserved pci_ser struct > from LUO (the struct that the previous kernel allocated and populated). > If pci_liveupdate_flb_get_incoming() returns an error, it means there > was no struct pci_ser preserved by the previous kernel (or at least not > that the current kernel is compatible with), so we return 0 here to > indicate that 0 devices were preserved. Right. Here's what I was thinking: pci_liveupdate_flb_get_incoming(...) { struct pci_ser *ser = *serp; ser->nr_devices = 0; ret = liveupdate_flb_get_incoming(...); ... if (ret == -ENOENT) { pr_info_once("PCI: No incoming FLB data detected during Live Update"); return; } WARN_ONCE(ret, "PCI: Failed to retrieve incoming ..."); } u32 pci_liveupdate_incoming_nr_devices(void) { pci_liveupdate_flb_get_incoming(&ser); return ser->nr_devices; } > > > +++ b/include/linux/kho/abi/pci.h > > > > It seems like most of include/linux/ is ABI, so does kho/abi/ need to > > be separated out in its own directory? > > include/linux/kho/abi/ contains all of the structs, enums, etc. that are > handed off between kernels during a Live Update. If almost anything > changes in this directory, it breaks our ability to upgrade/downgrade > via Live Update. That's why it's split off into its own directory. > > include/linux/ is not part of the Live Update ABI. Changes to those > headers to not affect our ability to upgrade/downgrade via Live Update. > > > It's kind of unusual for the hierarchy to be this deep, especially > > since abi/ is the only thing in include/linux/kho/. > > Yes I agree, but that is outside the scope of this patchset I think. > This directory already exists. Agreed. Bjorn

