On 9/11/25 22:42, David Hildenbrand wrote: > On 08.09.25 02:04, Balbir Singh wrote: >> Implement CPU fault handling for zone device THP entries through >> do_huge_pmd_device_private(), enabling transparent migration of >> device-private large pages back to system memory on CPU access. >> >> When the CPU accesses a zone device THP entry, the fault handler >> calls the device driver's migrate_to_ram() callback to migrate >> the entire large page back to system memory. >> >> Cc: Andrew Morton <a...@linux-foundation.org> >> Cc: David Hildenbrand <da...@redhat.com> >> Cc: Zi Yan <z...@nvidia.com> >> Cc: Joshua Hahn <joshua.hah...@gmail.com> >> Cc: Rakie Kim <rakie....@sk.com> >> Cc: Byungchul Park <byungc...@sk.com> >> Cc: Gregory Price <gou...@gourry.net> >> Cc: Ying Huang <ying.hu...@linux.alibaba.com> >> Cc: Alistair Popple <apop...@nvidia.com> >> Cc: Oscar Salvador <osalva...@suse.de> >> Cc: Lorenzo Stoakes <lorenzo.stoa...@oracle.com> >> Cc: Baolin Wang <baolin.w...@linux.alibaba.com> >> Cc: "Liam R. Howlett" <liam.howl...@oracle.com> >> Cc: Nico Pache <npa...@redhat.com> >> Cc: Ryan Roberts <ryan.robe...@arm.com> >> Cc: Dev Jain <dev.j...@arm.com> >> Cc: Barry Song <bao...@kernel.org> >> Cc: Lyude Paul <ly...@redhat.com> >> Cc: Danilo Krummrich <d...@kernel.org> >> Cc: David Airlie <airl...@gmail.com> >> Cc: Simona Vetter <sim...@ffwll.ch> >> Cc: Ralph Campbell <rcampb...@nvidia.com> >> Cc: Mika Penttilä <mpent...@redhat.com> >> Cc: Matthew Brost <matthew.br...@intel.com> >> Cc: Francois Dugast <francois.dug...@intel.com> >> >> Signed-off-by: Balbir Singh <balb...@nvidia.com> >> --- >> include/linux/huge_mm.h | 7 +++++++ >> mm/huge_memory.c | 36 ++++++++++++++++++++++++++++++++++++ >> mm/memory.c | 6 ++++-- >> 3 files changed, 47 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 23f124493c47..2c6a0c3c862c 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -496,6 +496,8 @@ static inline bool folio_test_pmd_mappable(struct folio >> *folio) >> vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); >> +vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf); >> + >> extern struct folio *huge_zero_folio; >> extern unsigned long huge_zero_pfn; >> @@ -675,6 +677,11 @@ static inline vm_fault_t do_huge_pmd_numa_page(struct >> vm_fault *vmf) >> return 0; >> } >> +static inline vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf) >> +{ > > Is this a VM_WARN_ON_ONCE() or similar? (Maybe BUILD_BUG is possible?) > >> + return 0; >> +} >> + >> static inline bool is_huge_zero_folio(const struct folio *folio) >> { >> return false; >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index b720870c04b2..d634b2157a56 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -1287,6 +1287,42 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct >> vm_fault *vmf) >> } >> +vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf) >> +{ >> + struct vm_area_struct *vma = vmf->vma; >> + vm_fault_t ret = 0; >> + spinlock_t *ptl; >> + swp_entry_t swp_entry; >> + struct page *page; >> + >> + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { >> + vma_end_read(vma); >> + return VM_FAULT_RETRY; >> + } >> + >> + ptl = pmd_lock(vma->vm_mm, vmf->pmd); >> + if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) { >> + spin_unlock(ptl); >> + return 0; >> + } >> + >> + swp_entry = pmd_to_swp_entry(vmf->orig_pmd); >> + page = pfn_swap_entry_to_page(swp_entry); >> + vmf->page = page; >> + vmf->pte = NULL; >> + if (trylock_page(vmf->page)) { >> + get_page(page); >> + spin_unlock(ptl); >> + ret = page_pgmap(page)->ops->migrate_to_ram(vmf); >> + unlock_page(vmf->page); >> + put_page(page); >> + } else { >> + spin_unlock(ptl); >> + } >> + >> + return ret; >> +} >> + >> /* >> * always: directly stall for all thp allocations >> * defer: wake kswapd and fail if not immediately available >> diff --git a/mm/memory.c b/mm/memory.c >> index d9de6c056179..860665f4b692 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -6298,8 +6298,10 @@ static vm_fault_t __handle_mm_fault(struct >> vm_area_struct *vma, >> vmf.orig_pmd = pmdp_get_lockless(vmf.pmd); >> if (unlikely(is_swap_pmd(vmf.orig_pmd))) { >> - VM_BUG_ON(thp_migration_supported() && >> - !is_pmd_migration_entry(vmf.orig_pmd)); >> + if (is_device_private_entry( >> + pmd_to_swp_entry(vmf.orig_pmd))) > > Single line please.
Ack > > But didn't we have a pmd helper for that? > This is a single if that handles is_swap_pmd() and then is_device_private_entry() and is_pmd_migration_entry() under that >> + return do_huge_pmd_device_private(&vmf); >> + >> if (is_pmd_migration_entry(vmf.orig_pmd)) >> pmd_migration_entry_wait(mm, vmf.pmd); >> return 0; > > Thanks, Balbir