On 6/5/26 23:19, Gregory Price wrote:
> The dax kmem driver currently onlines memory automatically during
> probe using the system's default online policy but provides no way
> to control or query the entire region state at runtime.
> 
> Additionally, there is no atomic mechanism to offline and remove
> the entire set of memory blocks together.  Instead, this is presently
> done in two steps: (offline all, remove all).  This creates a race
> condition where external entities can operate directly on the blocks
> and cause hot-unplug to fail.
> 
> Add a new 'hotplug' sysfs attribute that allows userspace to control
> and query the entire memory region state.  The writable states mirror
> the per-block /sys/devices/system/memory/memoryX/state ABI:
>   - "unplugged": memory blocks are not present
>   - "online": memory is online, zone chosen by the kernel
>   - "online_kernel": memory is online in ZONE_NORMAL
>   - "online_movable": memory is online in ZONE_MOVABLE
> 
> The "unplugged" state is new and only applies to kmem/hotplug.
> 
> Valid transitions:
>   - unplugged                               -> online[_kernel|_movable]
>   - online | online_kernel | online_movable -> unplugged
>   - offline                                 -> unplugged
> 
> A device can only be onlined from "unplugged", so it must be returned
> there before being onlined into a different state.
> 
> For backwards compatibility the memory blocks are always created at
> probe: existing tools expect them to be present once the kmem driver
> binds.  When the configured policy (mhp_get_default_online_type())
> selects an online state the blocks are onlined into that policy's zone;
> when the policy is offline the blocks are created but left offline and
> the device reports the state "offline".
> 
> "offline" is therefore a reportable state but is not writable: it only
> arises from the legacy auto_online_blocks=offline policy. Onlining such
> a device through this attribute requires unplugging it first.
> 
> The "offline" state may be deprecated later if the memory block ABI
> changes and userland migrates to using the region-wide hotplug.
> 
> Unplug is atomic across the whole device: dax_kmem_do_hotremove()
> collects every added range and offlines/removes them in one operation
> via offline_and_remove_memory_ranges().  Either all ranges are removed
> and the device becomes "unplugged", or offlining is rolled back and the
> device is left fully online, so the reported 'hotplug' state always
> matches reality.
> 
> Unbind Note:
>   We used to call remove_memory() during unbind, which would fire a
>   BUG() if any of the memory blocks were online at that time.  We lift
>   this into a WARN in the cleanup routine and don't attempt hotremove
>   if ->state is not DAX_KMEM_UNPLUGGED or MMOP_OFFLINE.  Memory that is
>   merely offline (the legacy "offline" state) is removed on unbind as
>   before; only online memory is left pinned.
> 
>   The resources are still leaked but this prevents deadlock on unbind
>   if a memory region happens to be impossible to hotremove.
> 
> Inconsistency Note:
> 
>   Since memory blocks can still be modified individually, the hotplug
>   attribute can become out of sync with the state of the system if
>   userland software mixes and matches the use of memory_block ABI and
>   kmem/hotplug ABI.  It's suggests to use one or the other.
> 
> Suggested-by: Hannes Reinecke <[email protected]>
> Suggested-by: David Hildenbrand <[email protected]>
> Signed-off-by: Gregory Price <[email protected]>
> ---

[...]

>  
> +static int dax_kmem_parse_state(const char *buf)
> +{
> +     if (sysfs_streq(buf, "unplugged"))
> +             return DAX_KMEM_UNPLUGGED;
> +     if (sysfs_streq(buf, "online"))
> +             return MMOP_ONLINE;
> +     if (sysfs_streq(buf, "online_kernel"))
> +             return MMOP_ONLINE_KERNEL;
> +     if (sysfs_streq(buf, "online_movable"))
> +             return MMOP_ONLINE_MOVABLE;
> +     return -EINVAL;

Should we try making use of mhp_online_type_from_str()/online_type_to_str()
[possibly a nicer exported function for the latter] to avoid duplicating this 
...

> +}
> +
> +static ssize_t hotplug_show(struct device *dev,
> +                         struct device_attribute *attr, char *buf)
> +{
> +     struct dax_kmem_data *data = dev_get_drvdata(dev);
> +     const char *state_str;
> +
> +     if (!data)
> +             return -ENXIO;
> +
> +     switch (data->state) {
> +     case DAX_KMEM_UNPLUGGED:
> +             state_str = "unplugged";
> +             break;
> +     case MMOP_OFFLINE:
> +             state_str = "offline";
> +             break;
> +     case MMOP_ONLINE:
> +             state_str = "online";
> +             break;
> +     case MMOP_ONLINE_KERNEL:
> +             state_str = "online_kernel";
> +             break;
> +     case MMOP_ONLINE_MOVABLE:
> +             state_str = "online_movable";
> +             break;

...

and this?


[sorry if we already discussed this]

-- 
Cheers,

David

Reply via email to