On 29.08.25 21:52, David Hildenbrand wrote:
Yes, that can absolutely happen. But for iomem we would have an explicit call
to ioremap(), ioremap_wc(), ioremap_cache() for that before anybody would map
anything into userspace page tables.
But thinking more about it I just had an OMFG moment! Is it possible that the
PAT currently already has a problem with that?
We had customer projects where BARs of different PCIe devices ended up on
different physical addresses after a hot remove/re-add.
Is it possible that the PAT keeps enforcing certain caching attributes for a
physical address? E.g. for example because a driver doesn't clean up properly
on hot remove?
If yes than that would explain a massive number of problems we had with hot
add/remove.
The code is a mess, so if a driver messed up, likely everything is possible.
TBH, the more I look at this all, the more WTF moments I am having.
What I am currently wondering is: assume we get a
pfnmap_setup_cachemode_pfn() call and we could reliably identify whether
there was a previous registration, then we could do
(a) No previous registration: don't modify pgprot. Hopefully the driver
knows what it is doing. Maybe we can add sanity checks that the
direct map was already updated etc.
(b) A previous registration: modify pgprot like we do today.
That would work for me.
System RAM is the problem. I wonder how many of these registrations we
really get and if we could just store them in the same tree as !system
RAM instead of abusing page flags.
commit 9542ada803198e6eba29d3289abb39ea82047b92
Author: Suresh Siddha <suresh.b.sid...@intel.com>
Date: Wed Sep 24 08:53:33 2008 -0700
x86: track memtype for RAM in page struct
Track the memtype for RAM pages in page struct instead of using the
memtype list. This avoids the explosion in the number of entries in
memtype list (of the order of 20,000 with AGP) and makes the PAT
tracking simpler.
We are using PG_arch_1 bit in page->flags.
We still use the memtype list for non RAM pages.
I do wonder if that explosion is still an issue today.
Yes it is. That is exactly the issue I'm working on here.
It's just that AGP was replaced by internal GPU MMUs over time and so we don't
use the old AGP code any more but just call get_free_pages() (or similar)
directly.
Okay, I thought I slowly understood how it works, then I stumbled over
the set_memory_uc / set_memory_wc implementation and now I am *all
confused*.
I mean, that does perform a PAT reservation.
But when is that reservation ever freed again? :/
Ah, set_memory_wb() does that. It just frees stuff. It should have been
called something like "reset", probably.
--
Cheers
David / dhildenb