On Fri, 27 Feb 2026 14:19:45 -0800 David Matlack <[email protected]> wrote:
> On Fri, Feb 27, 2026 at 10:25 AM Alex Williamson <[email protected]> wrote: > > > > On Fri, 27 Feb 2026 09:19:28 -0800 > > David Matlack <[email protected]> wrote: > > > > > On Fri, Feb 27, 2026 at 8:32 AM Alex Williamson <[email protected]> wrote: > > > > > > > > > > > On Thu, 26 Feb 2026 00:28:28 +0000 > > > > David Matlack <[email protected]> wrote: > > > > > > > +static int pci_flb_preserve(struct liveupdate_flb_op_args *args) > > > > > > > +{ > > > > > > > + struct pci_dev *dev = NULL; > > > > > > > + int max_nr_devices = 0; > > > > > > > + struct pci_ser *ser; > > > > > > > + unsigned long size; > > > > > > > + > > > > > > > + for_each_pci_dev(dev) > > > > > > > + max_nr_devices++; > > > > > > > > > > > > How is this protected against hotplug? > > > > > > > > > > Pranjal raised this as well. Here was my reply: > > > > > > > > > > . Yes, it's possible to run out space to preserve devices if devices > > > > > are > > > > > . hot-plugged and then preserved. But I think it's better to defer > > > > > . handling such a use-case exists (unless you see an obvious simple > > > > > . solution). So far I am not seeing preserving hot-plugged devices > > > > > . across Live Update as a high priority use-case to support. > > > > > > > > > > I am going to add a comment here in the next revision to clarify that. > > > > > I will also add a comment clarifying why this code doesn't bother to > > > > > account for VFs created after this call (preserving VFs are explicitly > > > > > disallowed to be preserved in this patch since they require additional > > > > > support). > > > > > > > > TBH, without SR-IOV support and some examples of in-kernel PF > > > > preservation in support of vfio-pci VFs, it seems like this only > > > > supports a very niche use case. > > > > > > The intent is to start by supporting a simple use-case and expand to > > > more complex scenarios over time, including preserving VFs. Full GPU > > > passthrough is common at cloud providers so even non-VF preservation > > > support is valuable. > > > > > > > I expect the majority of vfio-pci > > > > devices are VFs and I don't think we want to present a solution where > > > > the requirement is to move the PF driver to userspace. > > > > > > JasonG recommended the upstream support for VF preservation be limited > > > to cases where the PF is also bound to VFIO: > > > > > > https://lore.kernel.org/lkml/[email protected]/ > > > > > > Within Google we have a way to support in-kernel PF drivers but we are > > > trying to focus on simpler use-cases first upstream. > > > > > > > It's not clear, > > > > for example, how we can have vfio-pci variant drivers relying on > > > > in-kernel channels to PF drivers to support migration in this model. > > > > > > Agree this still needs to be fleshed out and designed. I think the > > > roadmap will be something like: > > > > > > 1. Get non-VF preservation working end-to-end (device fully preserved > > > and doing DMA continuously during Live Update). > > > 2. Extend to support VF preservation where the PF is also bound to > > > vfio-pci. > > > 3. (Maybe) Extend to support in-kernel PF drivers. > > > > > > This series is the first step of #1. I have line of sight to how #2 > > > could work since it's all VFIO. > > > > Without 3, does this become a mainstream feature? > > I do think there will be enough demand for (3) that it will be worth > doing. But I also think ordering the steps this way makes sense from > an iterative development point of view. > > > There's obviously a knee jerk reaction that moving PF drivers into > > userspace is a means to circumvent the GPL that was evident at LPC, > > even if the real reason is "in-kernel is hard". > > > > Related to that, there's also not much difference between a userspace > > driver and an out-of-tree driver when it comes to adding in-kernel code > > for their specific support requirements. Therefore, unless migration is > > entirely accomplished via a shared dmabuf between PF and VF, > > orchestrated through userspace, I'm not sure how we get to migration, > > making KHO vs migration a binary choice. I have trouble seeing how > > that's a viable intermediate step. Thanks, > > What do you mean by "migration" in this context? Live migration support, it's the primary use case currently where we have vfio-pci variant drivers on VFs communicating with in-kernel PF drivers. Thanks, Alex

