Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, January 26. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- This was a lengthy meeting with lots of discussion on various topics. There were a few quick updates to kick us off. Pasha updated on the status of stateless KHO: Mike approved of the latest changes from Jason Miu and a refresh was on the way as long as there were no more comments. It should then be on track for inclusion in mm-new. More review will be possible even after that. David Matlack updated on ordering issues when disabling interrupts during live update, he is working with NV GPU team. They have a subject matter expert who is looking into the issue and will update David next week. This should not delay future VFIO and IOMMU patch series, the changes for interrupts would be incremental on top of those series. Pasha updated on the status of luod: there were no major changes since the last meeting but Pasha was still looking for feedback on the design[1]. David then updated on VFIO v2 based on previous comments and was looking to post an update this week. This includes documentation for device preservation and then extended for iommu in the future. He's also getting rid of the patch that does auto-probing. He's also reworking the PCI FLB to not be accessed after devices are initialized. ----->o----- Samiullah continues to work on iommu persistence, he was looking at Pratyush's series on memfd seal preservation so that will be integrated. The next series should be sent out in the next week. He also discussed hitless replacement for iommu domains; he had connected with Baolu upstream and received feedback from Jason. Baolu will be reworking his series and then posting a new series upstream based on that. Jason noted some differences that will need to be done for hitless long-term to support ARM. Sami noted that he'll have to look into that and discuss with Will. Jason suggested using a temporary VM ID. ----->o----- Pratyush updated on HugeTLB preservation RFC and then presented this at LPC. Feedback has surfaced no specific concerns. He'll continue to work on this over the next couple weeks including tests. I asked if this was being reviewed by anybody outside of this group; Mike Rapoport noted there wasn't any feedback from core hugetlb reviewers yet. This should be in good shape within the next month or so. He also updated on versioning. During LPC, it was clear that the full support is not yet ready to go. The luo agent would read the vmlinux of both kernels before performing the load and then can look at a special section where all the versions of the different file handlers are listed. Unless you support multiple versions, you can never roll out a new version in the fleet, the next step will be to figure out how to do this. Pratyush was leaning toward his suggestion for version negotiation. Jason suggested writing out all the versions. Pratyush said there may be a conflict but also was curious about how that would scale. Jason suggested there should actually only be two; this should be considered as an escape hatch and shouldn't be overcomplicated or with new UAPIs. Writing all versions sounds like a useful starting point. Pasha said that adding a new UAPI could also be done later. ----->o----- Pasha brought up topics for LSF/MM/BPF that is coming up in May. Beyond HugeTLB preservation, he was wondering if there were additional topics to discuss there. Another possibility is guest_memfd support for HugeTLB. Pasha suggested KHO may also be in scope. Struct page initialization may be another topic to consider. ----->o----- Stanislav Kinsburskii noted that he sent an RFC out for blocking kexec in the kernel. He said Microsoft needs this ability when the kernel deposits pages into the hypervisor and we want to make sure that if guests are running or pages are deposited before we have proper support for these features, we want to block the kexec. Otherwise, the new kernel would become inconsistent and can't access pages that were deposited. He asked for any feedback on the list. Pasha suggested to use fd preservation for this instead. Stanislav noted that the kexec syscall does not shut down userspace so even if an fd is exposed to a userspace process, the kernel consistency depends on a userspace process to do an ioctl to keep an fd and if kexec does not shut down userspace processes then there is still a way to get kernel inconsistency. He suggested that this may not be a reliable way forward. Stanislav said with proper integration, we have a file descriptor and if userspace closes this fd before the kexec, things are consistent. We can have a guest running and pages deposited into the hypervisor and then somebody can do a kexec. The expectation is that the new kernel boots and that it is reliable; this is not the case with their hypervisor as the memory can be leaked. This is different where the VMs are KVM VMs and userspace processes. The suggestion was to have a hook to block kexec to support other approaches. Pasha noted that if you have preserved the memory and the state then this is no longer a normal kexec, it's a live update. Stanislav asked if this is the one procedure that can ensure consistency across the live update, if one does not do this exact procedure then they shouldn't rely on consistency after kexec. Pasha said that with LUO if userspace preserves fds then those fds are going to be preserved across the live update; if the session is not closed before the live update, then those fds are going to be preserved. ----->o----- Next meeting will be on Monday, February 9 at 8am PST (UTC-8), everybody is welcome: https://meet.google.com/rjn-dmzu-hgq Topics for the next meeting: - latest stateless KHO patches and inclusion in mm-new - ordering issues when disabling interrupts based on feedback from NV - luod design feedback and implementation next steps - VFIO patch series that would be incremental on top of the previous version - IOMMU persistence patch series - hitless replacement for iommu domains and series from Baolu - HugeTLB preservation support - versioning for luod to negotiate - live update related topics to propose for LSF/MM/BPF 2026 (hugetlb, KHO, deferred struct page initialization) - later: update on PCI preservation series and next steps - later: guest_memfd support for 1GB HugeTLB pages - later: testing methodology to allow downstream consumers to qualify that live update works from one version to another - later: reducing blackout window during live update, including deferred struct page initialization Please let me know if you'd like to propose additional topics for discussion, thank you! [1] https://tinyurl.com/luoddesign
