On 16.09.25 14:21, Balbir Singh wrote:
Implement CPU fault handling for zone device THP entries through
do_huge_pmd_device_private(), enabling transparent migration of
device-private large pages back to system memory on CPU access.

When the CPU accesses a zone device THP entry, the fault handler calls the
device driver's migrate_to_ram() callback to migrate the entire large page
back to system memory.

Signed-off-by: Balbir Singh <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: Joshua Hahn <[email protected]>
Cc: Rakie Kim <[email protected]>
Cc: Byungchul Park <[email protected]>
Cc: Gregory Price <[email protected]>
Cc: Ying Huang <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: "Liam R. Howlett" <[email protected]>
Cc: Nico Pache <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Dev Jain <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Danilo Krummrich <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Simona Vetter <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Mika Penttilä <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Francois Dugast <[email protected]>
---
  include/linux/huge_mm.h |  7 +++++++
  mm/huge_memory.c        | 36 ++++++++++++++++++++++++++++++++++++
  mm/memory.c             |  5 +++--
  3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index f327d62fc985..2d669be7f1c8 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -496,6 +496,8 @@ static inline bool folio_test_pmd_mappable(struct folio 
*folio)
vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); +vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf);
+
  extern struct folio *huge_zero_folio;
  extern unsigned long huge_zero_pfn;
@@ -671,6 +673,11 @@ static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
        return 0;
  }
+static inline vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf)
+{
+       return 0;
+}
+
  static inline bool is_huge_zero_folio(const struct folio *folio)
  {
        return false;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5291ee155a02..90a1939455dd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1287,6 +1287,42 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf)
} +vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf)
+{
+       struct vm_area_struct *vma = vmf->vma;
+       vm_fault_t ret = 0;
+       spinlock_t *ptl;
+       swp_entry_t swp_entry;
+       struct page *page;
+
+       if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
+               vma_end_read(vma);
+               return VM_FAULT_RETRY;
+       }
+
+       ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+       if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) {
+               spin_unlock(ptl);
+               return 0;
+       }
+
+       swp_entry = pmd_to_swp_entry(vmf->orig_pmd);
+       page = pfn_swap_entry_to_page(swp_entry);
+       vmf->page = page;
+       vmf->pte = NULL;
+       if (trylock_page(vmf->page)) {

We should be operating on a folio here. folio_trylock() + folio_get() + folio_unlock() + folio_put().

+               get_page(page);
+               spin_unlock(ptl);
+               ret = page_pgmap(page)->ops->migrate_to_ram(vmf);

BTW, I was wondering whether it is really the right design to pass the vmf here. Likely the const vma+addr+folio could be sufficient. I did not look into all callbaks, though.

--
Cheers

David / dhildenb

Reply via email to