On Tue, Sep 23, 2025 at 01:44:20PM +1000, Balbir Singh wrote:
> On 9/23/25 12:23, Zi Yan wrote:
> > On 16 Sep 2025, at 8:21, Balbir Singh wrote:
> > 
> >> Extend migrate_vma_collect_pmd() to handle partially mapped large folios
> >> that require splitting before migration can proceed.
> >>
> >> During PTE walk in the collection phase, if a large folio is only
> >> partially mapped in the migration range, it must be split to ensure the
> >> folio is correctly migrated.
> >>
> >> Signed-off-by: Balbir Singh <balb...@nvidia.com>
> >> Cc: David Hildenbrand <da...@redhat.com>
> >> Cc: Zi Yan <z...@nvidia.com>
> >> Cc: Joshua Hahn <joshua.hah...@gmail.com>
> >> Cc: Rakie Kim <rakie....@sk.com>
> >> Cc: Byungchul Park <byungc...@sk.com>
> >> Cc: Gregory Price <gou...@gourry.net>
> >> Cc: Ying Huang <ying.hu...@linux.alibaba.com>
> >> Cc: Alistair Popple <apop...@nvidia.com>
> >> Cc: Oscar Salvador <osalva...@suse.de>
> >> Cc: Lorenzo Stoakes <lorenzo.stoa...@oracle.com>
> >> Cc: Baolin Wang <baolin.w...@linux.alibaba.com>
> >> Cc: "Liam R. Howlett" <liam.howl...@oracle.com>
> >> Cc: Nico Pache <npa...@redhat.com>
> >> Cc: Ryan Roberts <ryan.robe...@arm.com>
> >> Cc: Dev Jain <dev.j...@arm.com>
> >> Cc: Barry Song <bao...@kernel.org>
> >> Cc: Lyude Paul <ly...@redhat.com>
> >> Cc: Danilo Krummrich <d...@kernel.org>
> >> Cc: David Airlie <airl...@gmail.com>
> >> Cc: Simona Vetter <sim...@ffwll.ch>
> >> Cc: Ralph Campbell <rcampb...@nvidia.com>
> >> Cc: Mika Penttilä <mpent...@redhat.com>
> >> Cc: Matthew Brost <matthew.br...@intel.com>
> >> Cc: Francois Dugast <francois.dug...@intel.com>
> >> ---
> >>  mm/migrate_device.c | 82 +++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 82 insertions(+)
> >>
> >> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> >> index abd9f6850db6..70c0601f70ea 100644
> >> --- a/mm/migrate_device.c
> >> +++ b/mm/migrate_device.c
> >> @@ -54,6 +54,53 @@ static int migrate_vma_collect_hole(unsigned long start,
> >>    return 0;
> >>  }
> >>
> >> +/**
> >> + * migrate_vma_split_folio() - Helper function to split a THP folio
> >> + * @folio: the folio to split
> >> + * @fault_page: struct page associated with the fault if any
> >> + *
> >> + * Returns 0 on success
> >> + */
> >> +static int migrate_vma_split_folio(struct folio *folio,
> >> +                             struct page *fault_page)
> >> +{
> >> +  int ret;
> >> +  struct folio *fault_folio = fault_page ? page_folio(fault_page) : NULL;
> >> +  struct folio *new_fault_folio = NULL;
> >> +
> >> +  if (folio != fault_folio) {
> >> +          folio_get(folio);
> >> +          folio_lock(folio);
> >> +  }
> >> +
> >> +  ret = split_folio(folio);
> >> +  if (ret) {
> >> +          if (folio != fault_folio) {
> >> +                  folio_unlock(folio);
> >> +                  folio_put(folio);
> >> +          }
> >> +          return ret;
> >> +  }
> >> +
> >> +  new_fault_folio = fault_page ? page_folio(fault_page) : NULL;
> >> +
> >> +  /*
> >> +   * Ensure the lock is held on the correct
> >> +   * folio after the split
> >> +   */
> >> +  if (!new_fault_folio) {
> >> +          folio_unlock(folio);
> >> +          folio_put(folio);
> >> +  } else if (folio != new_fault_folio) {
> >> +          folio_get(new_fault_folio);
> >> +          folio_lock(new_fault_folio);
> >> +          folio_unlock(folio);
> >> +          folio_put(folio);
> >> +  }
> >> +
> >> +  return 0;
> >> +}
> >> +
> >>  static int migrate_vma_collect_pmd(pmd_t *pmdp,
> >>                               unsigned long start,
> >>                               unsigned long end,
> >> @@ -136,6 +183,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> >>                     * page table entry. Other special swap entries are not
> >>                     * migratable, and we ignore regular swapped page.
> >>                     */
> >> +                  struct folio *folio;
> >> +
> >>                    entry = pte_to_swp_entry(pte);
> >>                    if (!is_device_private_entry(entry))
> >>                            goto next;
> >> @@ -147,6 +196,23 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> >>                        pgmap->owner != migrate->pgmap_owner)
> >>                            goto next;
> >>
> >> +                  folio = page_folio(page);
> >> +                  if (folio_test_large(folio)) {
> >> +                          int ret;
> >> +
> >> +                          pte_unmap_unlock(ptep, ptl);
> >> +                          ret = migrate_vma_split_folio(folio,
> >> +                                                    migrate->fault_page);
> >> +
> >> +                          if (ret) {
> >> +                                  ptep = pte_offset_map_lock(mm, pmdp, 
> >> addr, &ptl);
> >> +                                  goto next;
> >> +                          }
> >> +
> >> +                          addr = start;
> >> +                          goto again;
> >> +                  }
> > 
> > This does not look right to me.
> > 
> > The folio here is device private, but migrate_vma_split_folio()
> > calls split_folio(), which cannot handle device private folios yet.
> > Your change to split_folio() is in Patch 10 and should be moved
> > before this patch.
> > 
> 
> Patch 10 is to split the folio in the middle of migration (when we have
> converted the entries to migration entries). This patch relies on the
> changes in patch 4. I agree the names are confusing, I'll reword the
> functions

Hi Balbir,

I am still reviewing the patches, but I think I agree with Zi here.

split_folio() will replace the PMD mappings of the huge folio with PTE
mappings, but will also split the folio into smaller folios. The former
is ok with this patch, but the latter is probably not correct if the folio
is a zone device folio. The driver needs to know about the change, as
usually the driver will have some sort of mapping between GPU physical
memory chunks and their corresponding zone device pages.

Reply via email to