Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, August 11. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- We discussed the current status of LUO v3 which was out for review. Pasha had pinged akpm about merging this into mm-unstable. There were several changes from v2 -> v3 as a result from discussion upstream. Specifically there was discussion about how file descriptors could be orchesrated independently from each other; individual fds could be moved into a prepare state before the global state changed to prepare. There were also changes so that only one userspace agent could control LUO at at time. There were also design changes around IOMMU API based on feedback from Jason (making it more similar to iommufd for extensibility in the future). ----->o----- We discussed Live Update Orchestrator Daemon (luod) and the external design doc[1] that was shared upstream; this includes a description of the luod lifecycle (started early, cannot be killed by systemd, and continues after the reboot phase). We discussed sessions in luod. Previous discussions suggested that this should be managed in the kernel in LUO but there was a realization that they could be fully managed in userspace through luod itself. This is intuitive since there is only one agent that can control LUO at any given time. Any user that wants to preserve state with LUO would then create a session through luod (its UUID is a 128 bit value). This is required on the other side of the live update to identify the session. Imagine VMM establishing a session and then luod monitoing that connection so that when the connection dies, everything is removed from that session. If the VMM quits but still wants to preserve across LUO, it can commit everything to a session so that everything associated with the VMM can be preserved (after entering the prepare phase). This also ensures that on the other side of the live update that the client has to provide the session ID to ensure it is connected to the right data. Non-privileged processes can also preserve state if the admin process makes the sock accessible to non-privileged processes such as the VMM. luoctl was described as the CLI for interacting with luod. This allowed dumping the state in json format including for debugging purposes. Pasha expressed a general concern about security including for a compromised host; he did not believe that LUO should enforce that security boundary but rather wanted to have a security review that would help to establish luod's role for security. There was no implementation to go with this at this point, it was only in the design phase and people were highly encouraged to provide feedback for its design. ----->o----- Chris Li discussed PCI preservation and the latest RFC patch series that was posted. He received great feedback on the mailing list for RFC v1 especially from Jason, including for how to handle config registers. Jason suggested starting from preserving as minimal state as possible rather than preserving everything from the start (MSI config registers should likely be recreated on the other side of the live update). His suggestion was to minimize the preservation state as much as possible. Jason said one of the biggest open blockers was how the kernel would resynchronize with all the stuff that wasn't reinitialized. For example, for MSI, we save the registers but there was a question about what happens next time someone allocates an interrupt and what happens then. Jason suggested that the interrupts should be reset by the new kernel on the other side of the live update (interrupt preservation was never the plan so far). Vipin Sharma asked about how MSI preservation was never the plan; he suggested that KVM be able to check that some interrupt happened. Jason said the plan to date has been that interrupts are lost during KHO. David Matlack echoed this, that the plan for now is that any interrupts during live update are lost. In the future, posted interrupt preservation may be possible but this could be very complex. There was general feedback given to avoid leaving behind placeholders in the series that would have to be addressed later and potentially by other developers. Chris asked for feedback on the scope of v2 and the minimal viable patch series to make progress. Jason suggested preserving the bus master bit. However, he suggested the very first series of patches should likely focus on allowing DMA to continue to system memory. Chris agreed with this. Chris asked about which PCI devices would be good to start with. There was general feedback to start with a network device that does interrupts and basic DMA (like e1000). Jason suggested after that, that we shift toward PCI DMA with an IOMMU present and start talking about how to preserve the IOMMU configuration. ----->o----- Pasha brought up the point that in the context of Chris's RFC series that we have not yet discussed how to pass old data to the new kernel. He suggested spending time to design this. Chris said this was out of scope for the original patch series but now that the minimal viable approach has been discussed, it would be possible to discuss this in a future call. This was not predicted to use the device tree, but likely something closer to KSTATE. ----->o----- David Matlack noted that the deadline for proposals for the live update microconference was September 10[2]. ----->o----- Next meeting will be on Monday, August 25 at 8am PDT (UTC-7), everybody is welcome: https://meet.google.com/rjn-dmzu-hgq Topics for the next meeting: - discussion on whether these recorded sessions should appear on YouTube to share rather than only on the shared drive - discussion on latest status of LUO for the upstream kernel after rebase on top of 6.17-rc1 - update on feedback received for luod and the next steps as we head into implementation - update on the latest status of PCI preservation, registration, and initialization - [15 min] KSerial serialization protocol designed for exchanging data between live update kernels (BTF to extract or deposit data into the kernel native C struct with unique member id) + overlap with KSTATE and next steps for implementation - later: testing methodology to allow downstream consumers to qualify that live update works from one version to another - later: reducing blackout window during live update Please let me know if you'd like to propose additional topics for discussion, thank you! [1] https://tinyurl.com/luoddesign [2] https://lpc.events/event/19/contributions/2004/