On 6/24/26 4:57 PM, Gregory Price wrote:
There is no atomic mechanism to offline and remove an entire
multi-block DAX kmem device.  This is presently done in two steps:
     1. offline all
     2. remove all).

This creates a race condition where another entity operates directly
on the memory blocks and can cause hot-unplug to fail / unbind to
deadlock.

Add a new 'state' sysfs attribute that enables an atomic whole-device
hotplug operation across its entire memory region.

daxX.Y/state mirrors the per-block memoryX/state ABI:
   - [offline, online, online_kernel, online_movable]
   - "unplugged" - is added specifically for dax0.0/state

The valid writable states include:
   - "unplugged":      memory blocks are not present
   - "online":         memory is online, zone chosen by the kernel
   - "online_kernel":  memory is online in ZONE_NORMAL
   - "online_movable": memory is online in ZONE_MOVABLE

Valid transitions:
   - unplugged                -> online[_kernel|_movable]
   - online[_kernel|_movable] -> unplugged
   - offline                  -> unplugged

A device can only be onlined from "unplugged", so it must be returned
there before being onlined into a different state.

For backwards compatibility the memory blocks are always created at
probe - existing tools expect them to be present after kmem binds.

"offline" is therefore a reportable state but is not writable: it only
arises from the legacy auto_online_blocks=offline policy.  Onlining
such a device through this attribute requires unplugging it first in
an effort to get drivers creating DAX devices to set a default.

Unplug is atomic across the whole device: dax_kmem_do_hotremove()
collects every added range and offlines/removes them in one operation.
Either the operation succeeds or is entirely rolled back.

Unbind Note:
   We used to call remove_memory() during unbind, which would fire a
   BUG() if any of the memory blocks were online at that time.  We lift
   this into a WARN in the cleanup routine and don't attempt hotremove
   if ->state is not DAX_KMEM_UNPLUGGED or MMOP_OFFLINE.

   An offline dax device memory is removed on unbind as before.

   If online at unbind, the resources are leaked (as before), but now
   we prevent deadlock if a memory region is impossible to hotremove.

Suggested-by: Hannes Reinecke <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Signed-off-by: Gregory Price <[email protected]>
---
  Documentation/ABI/testing/sysfs-bus-dax |  26 +++
  drivers/base/memory.c                   |   9 +
  drivers/dax/kmem.c                      | 224 ++++++++++++++++++++----
  include/linux/memory_hotplug.h          |   1 +
  4 files changed, 224 insertions(+), 36 deletions(-)

That looks good, but question remains:

Why do we need to treat the 'unbind' call as a given thing?
If we know that we cannot handle online memory during unbind,
can't we just disallow unbind in that case?
I don't think it's too much to ask from an admin to offline
the memory first, _especially_ as now we have a simple knob
to do that ...

Cheers,

Hannes
--
Dr. Hannes Reinecke                  Kernel Storage Architect
[email protected]                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Reply via email to