Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, December 1. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- Pasha updated on the state of stateless KHO :) Jason Miu sent a recent update which received feedback internally and then will be sent again to the upstream mailing list. LUO v8 has been merged into stable and is scheduled for merge into 6.19 now that the merge widnow is open. For LUO, there are a few patches still outstanding: end-to-end testing (there is one patch that allows for creating a VM to do automatic testing) that is postponed, a change to preserve file life cycle bound global objects (no user in original patch series) that is postponed, and a patch for an internal API to retrieve struct files and get tokens for dependencies (no user) that is also postponed. Whichever user is merged first for preserving file life cycle bound global objects will upstream the overall support. ----->o----- Jork asked if user mode tools intergrate LUO into systemd. Pasha said that luod will be integrated with systemd; the design proposed the way it would be integrated. luod would be holding the sessions through the reboot command so that the VMM can exit. Jork asked where the designs were, Pasha pointed him to the cover letter for LUO. Pasha noted that the source code for luod would be open source and added to our GitHub. ----->o----- David Matlack updated on the status of his VFIO series[1] that was recently sent out; some feedback was being provided upstream that he will be iterating through. This series is largely mechanical to add the plumbing to preserve the VFIO device file descriptor across live update. Actual PCI device preservation will be built on top of it. Pasha discussed BDF for PCI device preservation and whether we should use a path instead, which shouldn't add much overhead. He opined that we may need to preserve devices that are not PCI, like TPM. Chris Li noted that the bus number is assigned from the ACPI table so you need to infer that the root is flat and that the first bus number is the slot, but that's ugly. David said that for non-PCI devices that these would likely need their own solutions. Pasha asked if the PCI device preservation would be extendable to non-PCI devices if possible; he acknowledged that BDF can't change but still wondered if there was a more common way to preserve. David suggested that anything here could be added incrementally on top later. For PCI assigned buses, Pasha suggested ignoring this parameter or disable live update if the parameter is set. It opened up the general question of how we should handle conflicting parameters -- should this be handled by VFIO or by LUO? Chris suggested bailing out when the buses we care about, i.e. those involved in the live update, get changed. Pasha suggested we could allow the auto assignment on first boot and then ignore the parameter on live update. David suggested that whenever a PCI device needs to be preserved (a callback into the PCI subsystem), that code can check if this option is enabled and, if so, fail. The next kernel would still need to check and perhaps panic if it's set. Pasha compared this to sanity checking the memmap for the new kernel which would be required for consistency. ----->o----- Samiullah updated on the status of the IOMMU preservation series that was going to be sent before LPC which was building on top of the VFIO series that David sent out above. There will be no autobind in this one and it is using internal tokens; the LUO get token API will be integrated later. This is currently planned as an RFC. Pasha asked about the lock-unlock functionality and whether Sami thought that this was lagging -- Sami said that it was actually better now because there's better flexibility. David said that the only locking for his series was synchronizing finish when the FLB is freed with anything that's using it. Anything using it already takes the mutex. If you try your own locking, then this ends up in a deadlock that is mentioned in the cover letter. Sami did not encounter this issue. ----->o----- Pratyush did not have updates for the HugeTLB + 1GB page preservation support but was hopeful there would be an update in the next week. He also updated that end of this week or next week he was hoping to have an RFC to share with an early implementation for versioning support to discuss at LPC. ----->o----- Jork discussed measuring KHO recently for internal use cases and found that traversing the preserved lists and inserting them into memblock took the majority of the blackout time. He asked about the later agenda item for deferred struct page initialization. Pasha said that was actually unrelated to the blackout window, it's rather an incompatibility with KHO. He acknowledged that KHO is very slow at inserting the memblocks, we need to address this scalability problem. Pratyush had some ideas how to handle this, but none of them are easy. ----->o----- There was an update from Ackerley on guest_memfd support for 1GB HugeTLB: he was working on qualifying an internal version and was hoping to get reviews on extending xarrays to support splitting to multiple levels[2] which was a prerequisite for the series. This was on track to post by early next year. ----->o----- The December 15 instance of the meeting is canceled due to LPC travel. Next meeting will be on Monday, December 29 at 8am PST (UTC-8), everybody is welcome: https://meet.google.com/rjn-dmzu-hgq Topics for the next meeting: - update on the status of stateless KHO patches from Jason Miu - update on the the status of LUO v8 for Linux 6.19, any patches that are still pending after upstream merge - discussion on design and status of luod as well as its integration with systemd - update for the VFIO patch series to preserve the VFIO device file descriptor across live update - timelines for PCI device preservation on top of VFIO patch series - next steps for iommu persistence to build upon the VFIO patch series once that is merged - status update for HugeTLB + 1GB page preservation support that was sent out in preparation for LPC - continued discussion on versioning support for various components for luod to negotiate - later: update on status of guest_memfd support for 1GB HugeTLB pages - later: testing methodology to allow downstream consumers to qualify that live update works from one version to another - later: reducing blackout window during live update, including deferred struct page initialization Please let me know if you'd like to propose additional topics for discussion, thank you! [1] https://lore.kernel.org/kvm/[email protected]/ [2] https://lore.kernel.org/all/[email protected]/
