On 6/5/26 23:19, Gregory Price wrote:
> The dax kmem driver currently onlines memory automatically during
> probe using the system's default online policy but provides no way
> to control or query the entire region state at runtime.
>
> Additionally, there is no atomic mechanism to offline and remove
> the entire set of memory blocks together. Instead, this is presently
> done in two steps: (offline all, remove all). This creates a race
> condition where external entities can operate directly on the blocks
> and cause hot-unplug to fail.
>
> Add a new 'hotplug' sysfs attribute that allows userspace to control
> and query the entire memory region state. The writable states mirror
> the per-block /sys/devices/system/memory/memoryX/state ABI:
> - "unplugged": memory blocks are not present
> - "online": memory is online, zone chosen by the kernel
> - "online_kernel": memory is online in ZONE_NORMAL
> - "online_movable": memory is online in ZONE_MOVABLE
>
> The "unplugged" state is new and only applies to kmem/hotplug.
>
> Valid transitions:
> - unplugged -> online[_kernel|_movable]
> - online | online_kernel | online_movable -> unplugged
> - offline -> unplugged
>
> A device can only be onlined from "unplugged", so it must be returned
> there before being onlined into a different state.
>
> For backwards compatibility the memory blocks are always created at
> probe: existing tools expect them to be present once the kmem driver
> binds. When the configured policy (mhp_get_default_online_type())
> selects an online state the blocks are onlined into that policy's zone;
> when the policy is offline the blocks are created but left offline and
> the device reports the state "offline".
>
> "offline" is therefore a reportable state but is not writable: it only
> arises from the legacy auto_online_blocks=offline policy. Onlining such
> a device through this attribute requires unplugging it first.
>
> The "offline" state may be deprecated later if the memory block ABI
> changes and userland migrates to using the region-wide hotplug.
>
> Unplug is atomic across the whole device: dax_kmem_do_hotremove()
> collects every added range and offlines/removes them in one operation
> via offline_and_remove_memory_ranges(). Either all ranges are removed
> and the device becomes "unplugged", or offlining is rolled back and the
> device is left fully online, so the reported 'hotplug' state always
> matches reality.
>
> Unbind Note:
> We used to call remove_memory() during unbind, which would fire a
> BUG() if any of the memory blocks were online at that time. We lift
> this into a WARN in the cleanup routine and don't attempt hotremove
> if ->state is not DAX_KMEM_UNPLUGGED or MMOP_OFFLINE. Memory that is
> merely offline (the legacy "offline" state) is removed on unbind as
> before; only online memory is left pinned.
>
> The resources are still leaked but this prevents deadlock on unbind
> if a memory region happens to be impossible to hotremove.
>
> Inconsistency Note:
>
> Since memory blocks can still be modified individually, the hotplug
> attribute can become out of sync with the state of the system if
> userland software mixes and matches the use of memory_block ABI and
> kmem/hotplug ABI. It's suggests to use one or the other.
>
> Suggested-by: Hannes Reinecke <[email protected]>
> Suggested-by: David Hildenbrand <[email protected]>
> Signed-off-by: Gregory Price <[email protected]>
> ---
[...]
>
> +static int dax_kmem_parse_state(const char *buf)
> +{
> + if (sysfs_streq(buf, "unplugged"))
> + return DAX_KMEM_UNPLUGGED;
> + if (sysfs_streq(buf, "online"))
> + return MMOP_ONLINE;
> + if (sysfs_streq(buf, "online_kernel"))
> + return MMOP_ONLINE_KERNEL;
> + if (sysfs_streq(buf, "online_movable"))
> + return MMOP_ONLINE_MOVABLE;
> + return -EINVAL;
Should we try making use of mhp_online_type_from_str()/online_type_to_str()
[possibly a nicer exported function for the latter] to avoid duplicating this
...
> +}
> +
> +static ssize_t hotplug_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct dax_kmem_data *data = dev_get_drvdata(dev);
> + const char *state_str;
> +
> + if (!data)
> + return -ENXIO;
> +
> + switch (data->state) {
> + case DAX_KMEM_UNPLUGGED:
> + state_str = "unplugged";
> + break;
> + case MMOP_OFFLINE:
> + state_str = "offline";
> + break;
> + case MMOP_ONLINE:
> + state_str = "online";
> + break;
> + case MMOP_ONLINE_KERNEL:
> + state_str = "online_kernel";
> + break;
> + case MMOP_ONLINE_MOVABLE:
> + state_str = "online_movable";
> + break;
...
and this?
[sorry if we already discussed this]
--
Cheers,
David