On 4/1/2026 7:12 PM, Marc-André Lureau wrote:
In Confidential Computing (CoCo) environments such as Intel TDX or AMD
SEV-SNP, hotplugged memory must be explicitly "accepted" (transitioned to
a private/encrypted state) before it can be safely used by the guest.
Conversely, before returning memory to the hypervisor during an unplug
operation, it must be converted back to a shared/decrypted state.
It's not a must to convert it back to shared. The memory is going to be
unplugged, the guest doesn't need to care the state of it unless there
is restriction that private memory cannot be unplugged. But we don't
have such restriction.
As I explained in the QEMU thread[1], the VMM needs to discard the
memory (both shared and private) on unplug. If the VMM fails to do so,
the memory is actually not unplugged and the guest is still able to
access them.
If the VMM fails to discard/remove the private memory, either
unintentionally or intentionally, it's the bug of the VMM. For TDX, this
kind of VMM bug can lead to re-accept error. To make TDX guest more
robust, we can let the guest release the memory itself on unplug, as
suggested by Paolo[2] and Kiryl[3], so that it can survive even with
buggy vmm. Converting the memory to shared is another approach for guest
to proactively "release" the private memory. But the justification of it
is not "guest must do so".
[1]
https://lore.kernel.org/qemu-devel/[email protected]/
[2]
https://lore.kernel.org/lkml/CABgObfZ7_w8Q-dW=Sd4YA3P==bun1edpv7ty4eppyu8ctw6...@mail.gmail.com/
[3] https://lore.kernel.org/lkml/acprNlPP7J_ttMrz@thinkstation/
Attempting to handle memory acceptance automatically using generic
architecture-level memory hotplug notifiers (e.g., MEM_GOING_ONLINE)
is not viable for devices like virtio-mem:
1. Granularity Mismatch: virtio-mem can dynamically hot(un)plug memory
at a subblock granularity (e.g., 2MB chunks within a 128MB memory
block). Generic memory notifiers operate on the entire memory block.
2. Lifecycle Control: Memory must be explicitly accepted *before* it is
handed to the core memory management subsystem (the buddy allocator),
and it must be decrypted *before* being handed back to the device.
3. State Tracking (Offline -> Re-online): If memory is offlined and
re-onlined without proper state transitions, TDX will panic on
attempting to accept an already-accepted page
(TDX_EPT_ENTRY_STATE_INCORRECT).
To address this, this patch implements explicit CoCo memory conversions
directly within the virtio-mem driver using set_memory_encrypted() and
set_memory_decrypted():
- During hotplug, explicitly accepts only the physically plugged subblocks
right before fake-onlining them into the buddy allocator.
- During unplug, memory is explicitly transitioned to the shared state
before being handed back to the host. If the unplug operation fails,
the driver attempts to re-accept (encrypt) the memory. If this
re-acceptance fails, the memory is intentionally leaked to prevent
confidentiality breaches or fatal hypervisor faults.
This was discovered while testing virtio-mem resize with TDX guests.
The associated QEMU virtio-mem + TDX patch series is under review at:
https://patchew.org/QEMU/[email protected]/
Note that QEMU punches the guest_memfd on KVM_HC_MAP_GPA_RANGE, when the
guest memory is decrypted. There is thus no need to discard the guest_memfd
in the virtio-mem device.
This patch is a follow-up and supersedes "[PATCH 0/2] x86/tdx: Fix
memory hotplug in TDX guests".