On Sat, Mar 21, 2026 at 10:40:21AM -0700, Andrew Morton wrote:
> On Sat, 21 Mar 2026 11:03:56 -0400 Gregory Price <[email protected]> wrote:
> 
> > The dax kmem driver currently onlines memory during probe using the
> > system default policy, with no way to control or query the region state
> > at runtime - other than by inspecting the state of individual blocks.
> > 
> > Offlining and removing an entire region requires operating on individual
> > memory blocks, creating race conditions where external entities can
> > interfere between the offline and remove steps.
> > 
> > The problem was discussed specifically in the LPC2025 device memory
> > sessions - https://lpc.events/event/19/contributions/2016/ - where
> > it was discussed how the non-atomic interface for dax hotplug is causing
> > issues in some distributions which have competing userland controllers
> > that interfere with each other.
> > 
> > This series adds a sysfs "hotplug" attribute for atomic whole-device
> > hotplug control, along with the mm and dax plumbing to support it.
> 
> AI review (which hasn't completed at this time) has a lot to say:
>       
> https://sashiko.dev/#/patchset/[email protected]

Looking at the results - i mucked up a UAF during the rebase that i
didn't catch during testing.  Will clean that up.

I also just realized I left an extern in one of the patches that I
thought I had removed.

So I owe a respin on this in more ways than one.

But on the AI review comment for non-trivial stuff
---

Much of the remaining commentary is about either the pre-existing code
race conditions, or design questions in the space of that race
condition.

Specifically: userland can still try to twiddle the memoryN/state bits
while the dax device loops over non-contiguous regions.

I dropped this commit:
https://lore.kernel.org/all/[email protected]/

>From the series, because the feedback here:
https://lore.kernel.org/linux-mm/[email protected]/

suggested that offline_and_remove_memory() would resolve the race
condition problem - but the patch proposed actually solved two issues:

1) Inconsistent hotplug state issue (user is still using the old
   per-block offlining pattern)

2) The old offline pattern calling BUG() instead of WARN() when trying
   to unbind while things are still online.

But this goes to the issue of:  If the race condition in userland has
been around for many years, is it to be considered a feature we should
not break - or on what time scale should we consider breaking it?

I don't know the answer, David will have to weigh in on that.

~Gregory

Reply via email to