On Wed, Nov 26, 2025 at 2:36 PM David Matlack <[email protected]> wrote: > > This series adds the base support to preserve a VFIO device file across > a Live Update. "Base support" means that this allows userspace to > safetly preserve a VFIO device file with LIVEUPDATE_SESSION_PRESERVE_FD > and retrieve a preserved VFIO device file with > LIVEUPDATE_SESSION_RETRIEVE_FD, but the device itself is not preserved > in a fully running state across Live Update. > > This series unblocks 2 parallel but related streams of work: > > - iommufd preservation across Live Update. This work spans iommufd, > the IOMMU subsystem, and IOMMU drivers [1] > > - Preservation of VFIO device state across Live Update (config space, > BAR addresses, power state, SR-IOV state, etc.). This work spans both > VFIO and the core PCI subsystem. > > While we need all of the above to fully preserve a VFIO device across a > Live Update without disrupting the workload on the device, this series > aims to be functional and safe enough to merge as the first incremental > step toward that goal. > > Areas for Discussion > -------------------- > > BDF Stability across Live Update > > The PCI support for tracking preserved devices across a Live Update to > prevent auto-probing relies on PCI segment numbers and BDFs remaining > stable. For now I have disallowed VFs, as the BDFs assigned to VFs can > vary depending on how the kernel chooses to allocate bus numbers. For > non-VFs I am wondering if there is any more needed to ensure BDF > stability across Live Update. > > While we would like to support many different systems and > configurations in due time (including preserving VFs), I'd like to > keep this first serses constrained to simple use-cases. > > FLB Locking > > I don't see a way to properly synchronize pci_flb_finish() with > pci_liveupdate_incoming_is_preserved() since the incoming FLB mutex is > dropped by liveupdate_flb_get_incoming() when it returns the pointer > to the object, and taking pci_flb_incoming_lock in pci_flb_finish() > could result in a deadlock due to reversing the lock ordering.
I will re-introduce _lock/_unlock API to solve this issue. > > FLB Retrieving > > The first patch of this series includes a fix to prevent an FLB from > being retrieved again it is finished. I am wondering if this is the > right approach or if subsystems are expected to stop calling > liveupdate_flb_get_incoming() after an FLB is finished. Thanks, I will include this fix in the next version of FLB.

