On 13.05.25 17:53, Liam R. Howlett wrote:
* David Hildenbrand <da...@redhat.com> [250512 08:34]:
On top of mm-unstable.

VM_PAT annoyed me too much and wasted too much of my time, let's clean
PAT handling up and remove VM_PAT.

This should sort out various issues with VM_PAT we discovered recently,
and will hopefully make the whole code more stable and easier to maintain.

In essence: we stop letting PAT mode mess with VMAs and instead lift
what to track/untrack to the MM core. We remember per VMA which pfn range
we tracked in a new struct we attach to a VMA (we have space without
exceeding 192 bytes), use a kref to share it among VMAs during
split/mremap/fork, and automatically untrack once the kref drops to 0.

What you do here seems to be decouple the vma start/end addresses by
abstracting them into another allocated ref counted struct.  This is
close to what we do with the anon vma name..

Yes, inspired by that.


It took a while to understand the underlying interval tree tracking of
this change, but I think it's as good as it was.  IIRC, there was a
shrinking and matching to the end address in the interval tree, but I
failed to find that commit and code - maybe it never made it upstream.
I was able to find a thread about splitting [1], so maybe I'm mistaken.

There was hidden code that kept a memremap() shrinking working (adjusting the tracked range).

The leftovers are removed in patch #8.

See below.



This implies that we'll keep tracking a full pfn range even after partially
unmapping it, until fully unmapping it; but as that case was mostly broken
before, this at least makes it work in a way that is least intrusive to
VMA handling.

Shrinking with mremap() used to work in a hacky way, now we'll similarly
keep the original pfn range tacked even after this form of partial unmap.
Does anybody care about that? Unlikely. If we run into issues, we could
likely handled that (adjust the tracking) when our kref drops to 1 while
freeing a VMA. But it adds more complexity, so avoid that for now.

The decoupling of the vma and ref counted range means that we could beef
up the backend to support actually tracking the correct range, which
would be nice..

Right, in patch #4 I have

"
This change implies that we'll keep tracking the original PFN range even after splitting + partially unmapping it: not too bad, because it was not working reliably before. The only thing that kind-of worked before was shrinking such a mapping using mremap(): we managed to adjust the reservation in a hacky way, now we won't adjust the reservation but
leave it around until all involved VMAs are gone.

If that ever turns out to be an issue, we could hook into VM splitting
code and split the tracking; however, that adds complexity that might
not be required, so we'll keep it simple for now.
"

Duplicating/moving/forking VMAs is now definitely better than before.

Splitting is also arguably better than before -- even a simple partial munmap() [1] is currently problematic, unless we're munmapping the last part of a VMA (-> shrinking).

Implementing splitting properly is a bit complicated if the pnfmap ctx has more than one ref, but it could be added if ever really required.

[1] https://lkml.kernel.org/r/20250509153033.952746-1-da...@redhat.com

> but I have very little desire to work on that.

Jap :)



[1] 
https://lore.kernel.org/all/5jrd43vusvcchpk2x6mouighkfhamjpaya5fu2cvikzaieg5pq@wqccwmjs4ian/


Briefly tested with the new pfnmap selftests [1].

[1] https://lkml.kernel.org/r/20250509153033.952746-1-da...@redhat.com

oh yes, that's still a pr_info() log.  I think that should be a pr_err()
at least?

I was wondering if that is actually a WARN_ON_ONCE(). Now, it should be much harder to actually trigger.

Thanks!

--
Cheers,

David / dhildenb

Reply via email to