This RFC series explores the start-private memory approach for virtio-mem
CoCo support using TDG.MEM.PAGE.RELEASE. We are seeking feedback from
Kiryl on the CoCo guest implementation, MM experts on the callback
infrastructure and virtio-mem integration, and broader virtio/CoCo
community input on the overall approach. We are not seeking x86 maintainer
review at this stage.

== Background ==

In Confidential Computing (CoCo) guests like TDX, memory hotplug
operations face unique challenges:

1. Newly added memory must be explicitly "accepted" by the guest using
TDG.MEM.PAGE.ACCEPT TDCALL before it can be safely accessed. Accessing
unaccepted memory triggers VM exits and guest crashes.
2. Hypervisor may perform no-op unplug operations, leaving old memory in
place. Re-accepting this already-accepted memory during re-plug operations
returns errors.
3. State management become much more complex, "accepted"/"unccepted" plus
"plugged"/"unplugged".
4. Initial virtio-mem memory may be start-private or start-shared.

A previous series [1][2] supports start-private memory and utilized memory
hotplug notifiers to call tdx_accept_memory() before pages are freed to
the buddy allocator. However, this approach has limitations:

1. virtio-mem operates memory at subblock granularity (e.g., 2MB chunks
within 128MB memory blocks), while generic memory notifiers operate on entire
memory blocks, causing acceptance of unplugged subblocks with no backing
memory.
2. Re-accepting already-accepted memory returns errors. Ignoring these errors
can mislead the guest into believing re-accepted memory is zeroed when it
contains stale data.

Currently, virtio-mem spec doesn't define what kind of hotplugged memory
should be supported for CoCo guest, shared or private or both. There is a
newer series [3][4] supporting start-shared memory in discuss. It converts
shared->private before online (via set_memory_encrypted-> MapGPA + ACCEPT),
and back to shared on unplug (via set_memory_decrypted).

== About this series ==

This series takes a different direction, supporting start-private memory
and addressing the limitations of previous series [1] by implementing a
callback-based infrastructure that integrates TDX memory acceptance and
release operations with proper subblock granularity. See Rick and Paolo's
discussion about using TDG.MEM.PAGE.RELEASE in [1].

The goal is not to compete with existing efforts, but rather to kick off
discussion and seek for suggestions from mm expert whether utilizing
callback-based infrastructure and PAGE.RELEASE API is a viable scheme.

We chose the generic post-plug and pre-unplug callback approach because
it provides a simple proof-of-concept that can support kexec/kdump
scenarios, though it does not support lazy acceptance. We rely on
community discussion to identify better, more upstreamable solutions if
the start-private direction is ultimately adopted.

== More details ==

**Post-plug callbacks** are registered by TDX guests during early boot and
triggered by virtio-mem after successfully requesting memory from the
hypervisor. The callback invokes tdx_accept_memory(), which performs
TDG.MEM.PAGE.ACCEPT TDCALL on the exact memory range that was plugged,
providing subblock-aware granularity. Note that tdx_accept_memory() may
not be fully self-consistent in all environments, as some pages may
remain in an "accepted" state while others do not, since page release is
not supported across all TDX module versions.

**Pre-unplug callbacks** are registered during early boot and invoked by
virtio-mem before requesting memory removal from the hypervisor. The
callback executes tdx_release_memory(), which performs
TDG.MEM.PAGE.RELEASE TDCALL with an optimization strategy that attempts
1GB/2MB page releases first before falling back to 4KB pages for maximum
efficiency. Unlike acceptance operations, tdx_release_memory() maintains
full self-consistency since page acceptance is universally supported
across TDX implementations.

**Error handling strategy** prioritizes system stability by marking the
virtio-mem device as broken whenever TDX operations fail:

1. Post-plug failures: If memory acceptance fails after successful
hypervisor allocation, the device is marked as broken to prevent memory
corruption. The hypervisor-side memory is leaked for the device lifetime.
2. Pre-unplug failures: If TDX memory release fails, the device is marked as
broken and no hypervisor unplug is attempted.
3. Hypervisor unplug failures: If the hypervisor unplug fails after
successful TDX release, the system attempts to re-accept the memory for
consistency. If re-acceptance fails, the device is marked as broken.

This approach avoids complex recovery mechanisms that could fail and
cause state corruption, choosing instead to fail safely by disabling the
device when TDX operations cannot maintain consistent state between guest
and hypervisor.

**PAGE.RELEASE configuration** requires explicit enablement by the
hypervisor during TD creation. The hypervisor must set the
CONFIG_FLAGS.PAGE_RELEASE flag in the TD's configuration to enable
TDG.MEM.PAGE.RELEASE functionality within the guest. Without this
configuration, guests cannot perform memory release operations and must
rely on the hypervisor to handle private memory release. This series
focuses on guest-side changes and does not include hypervisor
modifications, which can be added in future versions if needed.

== Testing ==
Tested with qemu [2] which supports start-private memory:
Basic memory hotplug/unplug test.
Basic kexec/kdump functions test with zero/half/full memory plugged.

Interestingly, it also pass with qemu [4] which supports start-shared memory,
because acceptance triggers memory convert implicitly, but it's slow as
implicit conversion is 4K page granularity.

== Future work ==
support lazy accept

Thanks
Zhenzhong

[1] kernel: 
https://lore.kernel.org/kvm/[email protected]/
[2] qemu: 
https://lore.kernel.org/qemu-devel/[email protected]/
[3] kernel: 
https://lore.kernel.org/lkml/[email protected]/
[4] qemu: 
https://lore.kernel.org/qemu-devel/[email protected]/


Zhenzhong Duan (6):
  mm/memory_hotplug: Add memory post-plug callback infrastructure
  mm/memory_hotplug: Add memory pre-unplug callback infrastructure
  virtio-mem: Integrate memory acceptance and release callbacks
  x86/tdx: Register memory post-plug callback for TDX guests
  x86/tdx: Register memory pre-unplug callback for TDX guests
  x86/tdx: Release private memory before private->shared conversion

 arch/x86/include/asm/shared/tdx.h |   2 +
 include/linux/memory_hotplug.h    |  21 ++++
 arch/x86/coco/tdx/tdx.c           | 174 ++++++++++++++++++++++++++++++
 drivers/virtio/virtio_mem.c       |  80 ++++++++++++--
 mm/memory_hotplug.c               |  40 +++++++
 5 files changed, 307 insertions(+), 10 deletions(-)

-- 
2.52.0


Reply via email to