On 21.08.25 12:05, Lorenzo Stoakes wrote:
On Thu, Aug 21, 2025 at 11:30:43AM +0200, David Hildenbrand wrote:
I will add this xen/apply_to_page_range() thing to my TODOs, which atm
would invovle changing these drivers to use vmf_insert_pfn_prot() instead.


Busy today (want to reply to Christian) but

a) Re: performance, we would want something like
    vmf_insert_pfns_prot(), similar to vm_insert_pages(), to bulk-insert
    multiple PFNs.

b) Re: PAT, we'll have to figure out why PAT information is wrong here
    (was there no previous PAT reservation from the driver?), but IF we
    really have to override, we'd want a way to tell
    vmf_insert_pfn_prot() to force the selected caching mode.


Ack, ok good that we have a feasible way forward.

FYI, spoke to Peter off-list and he mentioned he had a more general series
to get rid of this kind of [ab]use of apply_to_page_range() (see [0]), I
gather he hasn't the time to resurrect but perhaps one of us can at some
point?

Perhaps we need a shorter term fix to _this_ issue (which involves not
using this interface), and then follow it up with an adaptation of the
below?

We need to understand why PAT would be wrong and why it would even be ok
to ignore it. Not hacking around it.

FWIW, I just recently documented:

+/**
+ * pfnmap_setup_cachemode - setup the cachemode in the pgprot for a pfn range
+ * @pfn: the start of the pfn range
+ * @size: the size of the pfn range in bytes
+ * @prot: the pgprot to modify
+ *
+ * Lookup the cachemode for the pfn range starting at @pfn with the size
+ * @size and store it in @prot, leaving other data in @prot unchanged.
+ *
+ * This allows for a hardware implementation to have fine-grained control of
+ * memory cache behavior at page level granularity. Without a hardware
+ * implementation, this function does nothing.
+ *
+ * Currently there is only one implementation for this - x86 Page Attribute
+ * Table (PAT). See Documentation/arch/x86/pat.rst for more details.
+ *
+ * This function can fail if the pfn range spans pfns that require differing
+ * cachemodes. If the pfn range was previously verified to have a single
+ * cachemode, it is sufficient to query only a single pfn. The assumption is
+ * that this is the case for drivers using the vmf_insert_pfn*() interface.
+ *
+ * Returns 0 on success and -EINVAL on error.
+ */
+int pfnmap_setup_cachemode(unsigned long pfn, unsigned long size,
+               pgprot_t *prot);
 extern int track_pfn_copy(struct vm_area_struct *dst_vma,
                struct vm_area_struct *src_vma, unsigned long *pfn);
 extern void untrack_pfn_copy(struct vm_area_struct *dst_vma,
@@ -1563,6 +1584,21 @@ extern void untrack_pfn(struct vm_area_struct *vma, 
unsigned long pfn,
 extern void untrack_pfn_clear(struct vm_area_struct *vma);
 #endif
+/**
+ * pfnmap_setup_cachemode_pfn - setup the cachemode in the pgprot for a pfn
+ * @pfn: the pfn
+ * @prot: the pgprot to modify
+ *
+ * Lookup the cachemode for @pfn and store it in @prot, leaving other
+ * data in @prot unchanged.
+ *
+ * See pfnmap_setup_cachemode() for details.
+ */
+static inline void pfnmap_setup_cachemode_pfn(unsigned long pfn, pgprot_t 
*prot)
+{
+       pfnmap_setup_cachemode(pfn, PAGE_SIZE, prot);
+}


There is certainly something missing that the driver would have previously
verified that the cachemode is as expected.

--
Cheers

David / dhildenb

Reply via email to