On 6/24/26 4:57 PM, Gregory Price wrote:
There is no atomic mechanism to offline and remove an entire multi-block DAX kmem device. This is presently done in two steps: 1. offline all 2. remove all).This creates a race condition where another entity operates directly on the memory blocks and can cause hot-unplug to fail / unbind to deadlock. Add a new 'state' sysfs attribute that enables an atomic whole-device hotplug operation across its entire memory region. daxX.Y/state mirrors the per-block memoryX/state ABI: - [offline, online, online_kernel, online_movable] - "unplugged" - is added specifically for dax0.0/state The valid writable states include: - "unplugged": memory blocks are not present - "online": memory is online, zone chosen by the kernel - "online_kernel": memory is online in ZONE_NORMAL - "online_movable": memory is online in ZONE_MOVABLE Valid transitions: - unplugged -> online[_kernel|_movable] - online[_kernel|_movable] -> unplugged - offline -> unplugged A device can only be onlined from "unplugged", so it must be returned there before being onlined into a different state. For backwards compatibility the memory blocks are always created at probe - existing tools expect them to be present after kmem binds. "offline" is therefore a reportable state but is not writable: it only arises from the legacy auto_online_blocks=offline policy. Onlining such a device through this attribute requires unplugging it first in an effort to get drivers creating DAX devices to set a default. Unplug is atomic across the whole device: dax_kmem_do_hotremove() collects every added range and offlines/removes them in one operation. Either the operation succeeds or is entirely rolled back. Unbind Note: We used to call remove_memory() during unbind, which would fire a BUG() if any of the memory blocks were online at that time. We lift this into a WARN in the cleanup routine and don't attempt hotremove if ->state is not DAX_KMEM_UNPLUGGED or MMOP_OFFLINE. An offline dax device memory is removed on unbind as before. If online at unbind, the resources are leaked (as before), but now we prevent deadlock if a memory region is impossible to hotremove. Suggested-by: Hannes Reinecke <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Signed-off-by: Gregory Price <[email protected]> --- Documentation/ABI/testing/sysfs-bus-dax | 26 +++ drivers/base/memory.c | 9 + drivers/dax/kmem.c | 224 ++++++++++++++++++++---- include/linux/memory_hotplug.h | 1 + 4 files changed, 224 insertions(+), 36 deletions(-)
That looks good, but question remains: Why do we need to treat the 'unbind' call as a given thing? If we know that we cannot handle online memory during unbind, can't we just disallow unbind in that case? I don't think it's too much to ask from an admin to offline the memory first, _especially_ as now we have a simple knob to do that ... Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect [email protected] +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

