Hi everybody,

Here are the notes from the last Hypervisor Live Update call that happened 
on Monday, January 26.  Thanks to everybody who was involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
This was a lengthy meeting with lots of discussion on various topics.  
There were a few quick updates to kick us off.

Pasha updated on the status of stateless KHO: Mike approved of the latest 
changes from Jason Miu and a refresh was on the way as long as there were 
no more comments.  It should then be on track for inclusion in mm-new.  
More review will be possible even after that.

David Matlack updated on ordering issues when disabling interrupts during 
live update, he is working with NV GPU team.  They have a subject matter 
expert who is looking into the issue and will update David next week.  
This should not delay future VFIO and IOMMU patch series, the changes for 
interrupts would be incremental on top of those series. 

Pasha updated on the status of luod: there were no major changes since the 
last meeting but Pasha was still looking for feedback on the design[1].

David then updated on VFIO v2 based on previous comments and was looking 
to post an update this week.  This includes documentation for device 
preservation and then extended for iommu in the future.  He's also getting 
rid of the patch that does auto-probing.  He's also reworking the PCI FLB 
to not be accessed after devices are initialized.

----->o-----
Samiullah continues to work on iommu persistence, he was looking at 
Pratyush's series on memfd seal preservation so that will be integrated.  
The next series should be sent out in the next week.  He also discussed 
hitless replacement for iommu domains; he had connected with Baolu 
upstream and received feedback from Jason.  Baolu will be reworking his 
series and then posting a new series upstream based on that.

Jason noted some differences that will need to be done for hitless 
long-term to support ARM.  Sami noted that he'll have to look into that 
and discuss with Will.  Jason suggested using a temporary VM ID.

----->o-----
Pratyush updated on HugeTLB preservation RFC and then presented this at 
LPC.  Feedback has surfaced no specific concerns.  He'll continue to work 
on this over the next couple weeks including tests.  I asked if this was 
being reviewed by anybody outside of this group; Mike Rapoport noted there 
wasn't any feedback from core hugetlb reviewers yet.  This should be in 
good shape within the next month or so.

He also updated on versioning.  During LPC, it was clear that the full 
support is not yet ready to go.  The luo agent would read the vmlinux of 
both kernels before performing the load and then can look at a special 
section where all the versions of the different file handlers are listed.  
Unless you support multiple versions, you can never roll out a new version 
in the fleet, the next step will be to figure out how to do this.  
Pratyush was leaning toward his suggestion for version negotiation.

Jason suggested writing out all the versions.  Pratyush said there may be 
a conflict but also was curious about how that would scale.  Jason 
suggested there should actually only be two; this should be considered as 
an escape hatch and shouldn't be overcomplicated or with new UAPIs.  
Writing all versions sounds like a useful starting point.  Pasha said that 
adding a new UAPI could also be done later.

----->o-----
Pasha brought up topics for LSF/MM/BPF that is coming up in May.  Beyond 
HugeTLB preservation, he was wondering if there were additional topics to 
discuss there.  Another possibility is guest_memfd support for HugeTLB.  
Pasha suggested KHO may also be in scope.  Struct page initialization may 
be another topic to consider.

----->o-----
Stanislav Kinsburskii noted that he sent an RFC out for blocking kexec in 
the kernel.  He said Microsoft needs this ability when the kernel deposits 
pages into the hypervisor and we want to make sure that if guests are 
running or pages are deposited before we have proper support for these 
features, we want to block the kexec.  Otherwise, the new kernel would 
become inconsistent and can't access pages that were deposited.  He asked 
for any feedback on the list.

Pasha suggested to use fd preservation for this instead.  Stanislav noted 
that the kexec syscall does not shut down userspace so even if an fd is 
exposed to a userspace process, the kernel consistency depends on a 
userspace process to do an ioctl to keep an fd and if kexec does not shut 
down userspace processes then there is still a way to get kernel 
inconsistency.  He suggested that this may not be a reliable way forward.

Stanislav said with proper integration, we have a file descriptor and if 
userspace closes this fd before the kexec, things are consistent.  We can 
have a guest running and pages deposited into the hypervisor and then 
somebody can do a kexec.  The expectation is that the new kernel boots and 
that it is reliable; this is not the case with their hypervisor as the 
memory can be leaked.  This is different where the VMs are KVM VMs and 
userspace processes.  The suggestion was to have a hook to block kexec to 
support other approaches.

Pasha noted that if you have preserved the memory and the state then this 
is no longer a normal kexec, it's a live update.  Stanislav asked if this 
is the one procedure that can ensure consistency across the live update, 
if one does not do this exact procedure then they shouldn't rely on 
consistency after kexec.  Pasha said that with LUO if userspace preserves 
fds then those fds are going to be preserved across the live update; if 
the session is not closed before the live update, then those fds are going 
to be preserved.

----->o-----
Next meeting will be on Monday, February 9 at 8am PST (UTC-8), everybody
is welcome: https://meet.google.com/rjn-dmzu-hgq

Topics for the next meeting:

 - latest stateless KHO patches and inclusion in mm-new
 - ordering issues when disabling interrupts based on feedback from NV
 - luod design feedback and implementation next steps
 - VFIO patch series that would be incremental on top of the previous
   version
 - IOMMU persistence patch series
 - hitless replacement for iommu domains and series from Baolu
 - HugeTLB preservation support
 - versioning for luod to negotiate
 - live update related topics to propose for LSF/MM/BPF 2026 (hugetlb,
   KHO, deferred struct page initialization)
 - later: update on PCI preservation series and next steps
 - later: guest_memfd support for 1GB HugeTLB pages
 - later: testing methodology to allow downstream consumers to qualify
   that live update works from one version to another
 - later: reducing blackout window during live update, including deferred
   struct page initialization

Please let me know if you'd like to propose additional topics for
discussion, thank you!

[1] https://tinyurl.com/luoddesign

Reply via email to