On 26.08.25 14:07, David Hildenbrand wrote: >> >>> 2) We add another interface that consumes PFNs, but explicitly states >>> that it is only for ordinary system RAM, and that the user is >>> required for updating the direct map. >>> >>> We could sanity-check the direct map in debug kernels. >> >> I would rather like to see vmf_insert_pfn_prot() fixed instead. >> >> That function was explicitly added to insert the PFN with the given >> attributes and as far as I can see all users of that function expect exactly >> that. > > It's all a bit tricky :(
I would rather say horrible complicated :( >>> >>> 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating this >>> system RAM differently. >>> >>> >>> There is also the option for a mixture between 1 and 2, where we get pages, >>> but we map them non-refcounted in a VM_PFNMAP. >>> >>> In general, having pages makes it easier to assert that they are likely >>> ordinary system ram pages, and that the interface is not getting abused for >>> something else. >> >> Well, exactly that's the use case here and that is not abusive at all as far >> as I can see. >> >> What drivers want is to insert a PFN with a certain set of caching >> attributes regardless if it's system memory or iomem. That's why >> vmf_insert_pfn_prot() was created in the first place. > > I mean, the use case of "allocate pages from the buddy and fixup the linear > map" sounds perfectly reasonable to me. Absolutely no reason to get PAT > involved. Nobody else should be messing with that memory after all. > > As soon as we are talking about other memory ranges (iomem) that are not from > the buddy, it gets weird to bypass PAT, and the question I am asking myself > is, when is it okay, and when not. Ok let me try to explain parts of the history and the big picture for at least the graphics use case on x86. In 1996/97 Intel came up with the idea of AGP: https://en.wikipedia.org/wiki/Accelerated_Graphics_Port At that time the CPUs, PCI bus and system memory were all connected together through the north bridge: https://en.wikipedia.org/wiki/Northbridge_(computing) The problem was that AGP also introduced the concept of putting large amounts of data for the video controller (PCI device) into system memory when you don't have enough local device memory (VRAM). But that meant when that memory is cached that the north bridge always had to snoop the CPU cache over the front side bus for every access the video controller made. This meant a huge performance bottleneck, so the idea was born to access that data uncached. Well that was nearly 30years ago, PCI, AGP and front side bus are long gone, but the concept of putting video controller (GPU) stuff into uncached system memory has prevailed. So for example even modern AMD CPU based laptops need uncached system memory if their local memory is not large enough to contain the picture to display on the monitor. And with modern 8k monitors that can actually happen quite fast... What drivers do today is to call vmf_insert_pfn_prot() either with the PFN of their local memory (iomem) or uncached/wc system memory. To summarize that we have an interface to fill in the page tables with either iomem or system memory is actually part of the design. That's how the HW driver is expected to work. >> That drivers need to call set_pages_wc/uc() for the linear mapping on x86 >> manually is correct and checking that is clearly a good idea for debug >> kernels. > > I'll have to think about this a bit: assuming only vmf_insert_pfn() calls > pfnmap_setup_cachemode_pfn() but vmf_insert_pfn_prot() doesn't, how could we > sanity check that somebody is doing something against the will of PAT. I think the most defensive approach for a quick fix is this change here: static inline void pgprot_set_cachemode(pgprot_t *prot, enum page_cache_mode pcm) { - *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | - cachemode2protval(pcm)); + if (pcm != _PAGE_CACHE_MODE_WB) + *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | + cachemode2protval(pcm)); } This applies the PAT value if it's anything else than _PAGE_CACHE_MODE_WB but still allows callers to use something different on normal WB system memory. What do you think? Regards, Christian