Hi everybody,

Here are the notes from the last Hypervisor Live Update call that happened 
on Monday, December 1.  Thanks to everybody who was involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
Pasha updated on the state of stateless KHO :)  Jason Miu sent a recent 
update which received feedback internally and then will be sent again to 
the upstream mailing list.

LUO v8 has been merged into stable and is scheduled for merge into 6.19 
now that the merge widnow is open.  For LUO, there are a few patches still 
outstanding: end-to-end testing (there is one patch that allows for 
creating a VM to do automatic testing) that is postponed, a change to 
preserve file life cycle bound global objects (no user in original patch 
series) that is postponed, and a patch for an internal API to retrieve 
struct files and get tokens for dependencies (no user) that is also 
postponed.

Whichever user is merged first for preserving file life cycle bound global 
objects will upstream the overall support.

----->o-----
Jork asked if user mode tools intergrate LUO into systemd.  Pasha said 
that luod will be integrated with systemd; the design proposed the way it 
would be integrated.  luod would be holding the sessions through the 
reboot command so that the VMM can exit.  Jork asked where the designs 
were, Pasha pointed him to the cover letter for LUO.

Pasha noted that the source code for luod would be open source and added 
to our GitHub.

----->o-----
David Matlack updated on the status of his VFIO series[1] that was 
recently sent out; some feedback was being provided upstream that he will 
be iterating through.  This series is largely mechanical to add the 
plumbing to preserve the VFIO device file descriptor across live update.  
Actual PCI device preservation will be built on top of it.

Pasha discussed BDF for PCI device preservation and whether we should use 
a path instead, which shouldn't add much overhead.  He opined that we may 
need to preserve devices that are not PCI, like TPM.  Chris Li noted that 
the bus number is assigned from the ACPI table so you need to infer that 
the root is flat and that the first bus number is the slot, but that's 
ugly.

David said that for non-PCI devices that these would likely need their own 
solutions.  Pasha asked if the PCI device preservation would be extendable 
to non-PCI devices if possible; he acknowledged that BDF can't change but 
still wondered if there was a more common way to preserve.  David 
suggested that anything here could be added incrementally on top later.

For PCI assigned buses, Pasha suggested ignoring this parameter or disable 
live update if the parameter is set.  It opened up the general question of 
how we should handle conflicting parameters -- should this be handled by 
VFIO or by LUO?  Chris suggested bailing out when the buses we care about, 
i.e. those involved in the live update, get changed.  Pasha suggested we 
could allow the auto assignment on first boot and then ignore the 
parameter on live update.  David suggested that whenever a PCI device 
needs to be preserved (a callback into the PCI subsystem), that code can 
check if this option is enabled and, if so, fail.  The next kernel would 
still need to check and perhaps panic if it's set.  Pasha compared this to 
sanity checking the memmap for the new kernel which would be required for 
consistency.

----->o-----
Samiullah updated on the status of the IOMMU preservation series that was 
going to be sent before LPC which was building on top of the VFIO series 
that David sent out above.  There will be no autobind in this one and it 
is using internal tokens; the LUO get token API will be integrated later.  
This is currently planned as an RFC.

Pasha asked about the lock-unlock functionality and whether Sami thought 
that this was lagging -- Sami said that it was actually better now because 
there's better flexibility.  David said that the only locking for his 
series was synchronizing finish when the FLB is freed with anything that's 
using it.  Anything using it already takes the mutex.  If you try your own 
locking, then this ends up in a deadlock that is mentioned in the cover 
letter.  Sami did not encounter this issue.

----->o-----
Pratyush did not have updates for the HugeTLB + 1GB page preservation 
support but was hopeful there would be an update in the next week.

He also updated that end of this week or next week he was hoping to have 
an RFC to share with an early implementation for versioning support to 
discuss at LPC.

----->o-----
Jork discussed measuring KHO recently for internal use cases and found 
that traversing the preserved lists and inserting them into memblock took 
the majority of the blackout time.  He asked about the later agenda item 
for deferred struct page initialization.  Pasha said that was actually 
unrelated to the blackout window, it's rather an incompatibility with KHO.  
He acknowledged that KHO is very slow at inserting the memblocks, we need 
to address this scalability problem.  Pratyush had some ideas how to 
handle this, but none of them are easy.

----->o-----
There was an update from Ackerley on guest_memfd support for 1GB HugeTLB: 
he was working on qualifying an internal version and was hoping to get 
reviews on extending xarrays to support splitting to multiple levels[2] 
which was a prerequisite for the series.  This was on track to post by 
early next year.

----->o-----
The December 15 instance of the meeting is canceled due to LPC travel.

Next meeting will be on Monday, December 29 at 8am PST (UTC-8), everybody
is welcome: https://meet.google.com/rjn-dmzu-hgq

Topics for the next meeting:

 - update on the status of stateless KHO patches from Jason Miu
 - update on the the status of LUO v8 for Linux 6.19, any patches that are
   still pending after upstream merge
 - discussion on design and status of luod as well as its integration with
   systemd
 - update for the VFIO patch series to preserve the VFIO device file
   descriptor across live update
 - timelines for PCI device preservation on top of VFIO patch series
 - next steps for iommu persistence to build upon the VFIO patch series
   once that is merged
 - status update for HugeTLB + 1GB page preservation support that was sent
   out in preparation for LPC
 - continued discussion on versioning support for various components for
   luod to negotiate
 - later: update on status of guest_memfd support for 1GB HugeTLB pages
 - later: testing methodology to allow downstream consumers to qualify
   that live update works from one version to another
 - later: reducing blackout window during live update, including deferred
   struct page initialization

Please let me know if you'd like to propose additional topics for
discussion, thank you!

[1]
https://lore.kernel.org/kvm/[email protected]/
[2] 
https://lore.kernel.org/all/[email protected]/

Reply via email to