Re:

Christian König Tue, 26 Aug 2025 15:06:52 -0700

On 26.08.25 14:07, David Hildenbrand wrote: 
>>
>>> 2) We add another interface that consumes PFNs, but explicitly states
>>>     that it is only for ordinary system RAM, and that the user is
>>>     required for updating the direct map.
>>>
>>>     We could sanity-check the direct map in debug kernels.
>>
>> I would rather like to see vmf_insert_pfn_prot() fixed instead.
>>
>> That function was explicitly added to insert the PFN with the given 
>> attributes and as far as I can see all users of that function expect exactly 
>> that.
> 
> It's all a bit tricky :(


I would rather say horrible complicated :(

>>>
>>> 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating this
>>>     system RAM differently.
>>>
>>>
>>> There is also the option for a mixture between 1 and 2, where we get pages, 
>>> but we map them non-refcounted in a VM_PFNMAP.
>>>
>>> In general, having pages makes it easier to assert that they are likely 
>>> ordinary system ram pages, and that the interface is not getting abused for 
>>> something else.
>>
>> Well, exactly that's the use case here and that is not abusive at all as far 
>> as I can see.
>>
>> What drivers want is to insert a PFN with a certain set of caching 
>> attributes regardless if it's system memory or iomem. That's why 
>> vmf_insert_pfn_prot() was created in the first place.
> 
> I mean, the use case of "allocate pages from the buddy and fixup the linear 
> map" sounds perfectly reasonable to me. Absolutely no reason to get PAT 
> involved. Nobody else should be messing with that memory after all.
> 
> As soon as we are talking about other memory ranges (iomem) that are not from 
> the buddy, it gets weird to bypass PAT, and the question I am asking myself 
> is, when is it okay, and when not.

Ok let me try to explain parts of the history and the big picture for at least 
the graphics use case on x86.

In 1996/97 Intel came up with the idea of AGP: 
https://en.wikipedia.org/wiki/Accelerated_Graphics_Port

At that time the CPUs, PCI bus and system memory were all connected together 
through the north bridge: https://en.wikipedia.org/wiki/Northbridge_(computing)

The problem was that AGP also introduced the concept of putting large amounts 
of data for the video controller (PCI device) into system memory when you don't 
have enough local device memory (VRAM).

But that meant when that memory is cached that the north bridge always had to 
snoop the CPU cache over the front side bus for every access the video 
controller made. This meant a huge performance bottleneck, so the idea was born 
to access that data uncached.


Well that was nearly 30years ago, PCI, AGP and front side bus are long gone, 
but the concept of putting video controller (GPU) stuff into uncached system 
memory has prevailed.

So for example even modern AMD CPU based laptops need uncached system memory if 
their local memory is not large enough to contain the picture to display on the 
monitor. And with modern 8k monitors that can actually happen quite fast...

What drivers do today is to call vmf_insert_pfn_prot() either with the PFN of 
their local memory (iomem) or uncached/wc system memory.


To summarize that we have an interface to fill in the page tables with either 
iomem or system memory is actually part of the design. That's how the HW driver 
is expected to work.

>> That drivers need to call set_pages_wc/uc() for the linear mapping on x86 
>> manually is correct and checking that is clearly a good idea for debug 
>> kernels.
> 
> I'll have to think about this a bit: assuming only vmf_insert_pfn() calls 
> pfnmap_setup_cachemode_pfn() but vmf_insert_pfn_prot() doesn't, how could we 
> sanity check that somebody is doing something against the will of PAT.

I think the most defensive approach for a quick fix is this change here:

 static inline void pgprot_set_cachemode(pgprot_t *prot, enum page_cache_mode 
pcm)
 {
-       *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) |
-                        cachemode2protval(pcm));
+       if (pcm != _PAGE_CACHE_MODE_WB)
+               *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) |
+                                cachemode2protval(pcm));
 }

This applies the PAT value if it's anything else than _PAGE_CACHE_MODE_WB but 
still allows callers to use something different on normal WB system memory.

What do you think?

Regards,
Christian

Re:

Reply via email to