Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, October 20. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- I thought this instance of the meeting would be short and I turned out to be very wrong :) We touched on the discussion from the previous instance regarding the fd dependency checking and this happening at the time of preserve rather than prepare, Pasha noted that the discussion continued upstream afterwards on the mailing list. The biggest change would be that the order is going to be enforced by the user. The preserve function itself is the heavy lifting now; the freeze and prepare are more for sanity checking. David Matlack asked how the global states wuld work since that's outside the fd. Pasha said the subsystem will be there but there will be another mechanism that follows the lifecycle of fds of a specific type; example is if a session has an fd of a specific type then it will follow the lifecycle of the aggregate. This will be supported in v5. ----->o----- Pasha updated that he had sent the KHO patches that provide the groundwork for LUO. Last week he also sent a KHO memory corruption fix. Once those patches are merged, he will send LUO v5. He was targeting sending the next series of changes before the next biweekly sync. ----->o----- Vipin Sharma sent out RFC patches for VFIO and was looking for feedback from the group in the next instance of the meeting. Jason was providing feedback on the upstream mailing list already. ----->o----- We shifted to discussing the main topic of the day which was iommu persistence from Samiullah. His slides are available on the shared drive. There was general alignment with what should be included in the next series upstream. His demonstrator so far included iommufd, iommu core, and iommu driver patches but was just preserving root tables. He also proposed hot swap. There was lots of discussion upstream around selection of HWPT to be preserved, preserved HWPT and iommu domain lifecycle, fd dependencies, and LUO finish. Pasha noted that LUO finish can now fail which Jason asked about. Pasha said if the fd hasn't replaced the hardware page table then finish would have to fail. Sami noted that the HWPTs are also restored and associated with the preserved iommu domains and this would be done when the fd is retrieved. We can't restore the domain during the probe but there is no mechanism to have the HWPTs to be created during the boot time. Jason said during probe time you put the domains back with placeholders so the iommu core has some understanding what the translation is. ----->o----- During the discussion for hotswap, Sami noted that once all the preserved devices have their iommu domains hot swapped, we can destroy the restored iommu domains that are not being used. Jason said that once the iommu domains are rehydrated back into an fd that they should have the normal lifecycle of a hardware page table in an fd. So they will be destroyed when the hardware page table is destroyed when the fd closes it or the VMM asks it to be destroyed. Jason noted that the VMM needs the id so that it can be destroyed. Jason suggested restoring the hardware page table pointers inside the devices that represent the currently attached hardware page table and this is done when you bring back the iommufd. We should likely retain a list for each hardware page table the list of which VFIO device objects are linked to it and this all needs to be brought back. Or an alternative may be to serialize the devices. IOMMU needs the VFIO devices and this needs careful orchestration. Pasha suggested that since we have the session and sessions have specific orders, the things without any dependencies that were preserved first and things with dependencies were preserved last. The kernel could call restore on everything from lowest to highest. Jason said there needs to be a two step process: the struct file needs to be brought back before you fill it. VFIO needs the iommufd to be filled before it can auto bind before it can complete its restoration. Sami suggested if we don't restore the HWPT until we have all the information, even if it closes it goes back to the state that it was in and we would consider the iommufd not fully restored until it is. Jason suggested that would require adding an iommufd ioctl to restore individual sub objects: restoring a HWPT that was with this tag and give back the id; the restore would only be possible if the VFIO devices are already present inside the iommufd. ----->o----- When discussing LUO finish, Pasha suggested we need a way to discard a session if it hasn't been reclaimed or there are exceptions. If the VM never is restored then we will have lingering session that need to be somehow discarded. Jason suggested all objects are brought back to userspace before you can encounter an error. If there are problems up to that point, then the cleanest way to address this is with another kexec. Jason stressed the need for another kexec as a big hammer to be able to do recovery and cleanup. For example, if there are 10 VMs and one did not restore, do another live update to clean up the lingering VM. ----->o----- Next meeting will be on Monday, November 3 at 8am PST (UTC-8), everybody is welcome: https://meet.google.com/rjn-dmzu-hgq NOTE!!! Daylight Savings Time has ended in the United States, so please check your local time carefully: Time zones PST (UTC-8) 8:00am MST (UTC-7) 9:00am CST (UTC-6) 10:00am EST (UTC-5) 11:00am Rio de Janeiro (UTC-3) 1:00pm London (UTC) 4:00pm Berlin (UTC+1) 5:00pm Moscow (UTC+3) 7:00pm Dubai (UTC+4) 8:00pm Mumbai (UTC+5:30) 9:30pm Singapore (UTC+8) 12:00am Tuesday Beijing (UTC+8) 12:00am Tuesday Tokyo (UTC+9) 1:00am Tuesday Sydney (UTC+11) 3:00am Tuesday Auckland (UTC+13) 5:00am Tuesday Topics for the next meeting: - update on the status of stateless KHO RFC patches that should simplify LUO support - update on LUO v5 and patch series sent upstream after KHO changes and fixes are staged - VFIO RFC patch feedback based on the series sent to the mailing list a couple weeks ago - follow up on the status of iommu persistence and any addtional discussion from last time - update on memfd preservation, vmalloc support, and 1GB limitation - discuss deferred struct page initialization and deferring when KHO is enabled - discuss guest_memfd preservation use cases for Confidential Computing and any current work happening on it, including overlap with memfd preservation being worked on by Pratyush + discuss any use cases for Confidential Computing where folios may need to be split after being marked as preserved during brown out - later: testing methodology to allow downstream consumers to qualify that live update works from one version to another - later: reducing blackout window during live update Please let me know if you'd like to propose additional topics for discussion, thank you!
