On 5/5/22 20:39, Baolin Wang wrote: > > On 5/6/2022 7:53 AM, Mike Kravetz wrote: >> On 4/29/22 01:14, Baolin Wang wrote: >>> On some architectures (like ARM64), it can support CONT-PTE/PMD size >>> hugetlb, which means it can support not only PMD/PUD size hugetlb: >>> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page >>> size specified. >> <snip> >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 6fdd198..7cf2408 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, >>> struct vm_area_struct *vma, >>> break; >>> } >>> } >>> + >>> + /* Nuke the hugetlb page table entry */ >>> + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); >>> } else { >>> flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); >>> + /* Nuke the page table entry. */ >>> + pteval = ptep_clear_flush(vma, address, pvmw.pte); >>> } >>> >> >> On arm64 with CONT-PTE/PMD the returned pteval will have dirty or young set >> if ANY of the PTE/PMDs had dirty or young set. > > Right. > >> >>> - /* Nuke the page table entry. */ >>> - pteval = ptep_clear_flush(vma, address, pvmw.pte); >>> - >>> /* Set the dirty flag on the folio now the pte is gone. */ >>> if (pte_dirty(pteval)) >>> folio_mark_dirty(folio); >>> @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, >>> struct vm_area_struct *vma, >>> pte_t swp_pte; >>> if (arch_unmap_one(mm, vma, address, pteval) < 0) { >>> - set_pte_at(mm, address, pvmw.pte, pteval); >>> + if (folio_test_hugetlb(folio)) >>> + set_huge_pte_at(mm, address, pvmw.pte, pteval); >> >> And, we will use that pteval for ALL the PTE/PMDs here. So, we would set >> the dirty or young bit in ALL PTE/PMDs. >> >> Could that cause any issues? May be more of a question for the arm64 people. > > I don't think this will cause any issues. Since the hugetlb can not be split, > and we should not lose the the dirty or young state if any subpages were set. > Meanwhile we already did like this in hugetlb.c: > > pte = huge_ptep_get_and_clear(mm, address, ptep); > tlb_remove_huge_tlb_entry(h, tlb, ptep, address); > if (huge_pte_dirty(pte)) > set_page_dirty(page); >
Agree that it 'should not' cause issues. It just seems inconsistent. This is not a problem specifically with your patch, just the handling of CONT-PTE/PMD entries. There does not appear to be an arm64 specific version of huge_ptep_get() that takes CONT-PTE/PMD into account. So, huge_ptep_get() would only return the one specific value. It would not take into account the dirty or young bits of CONT-PTE/PMDs like your new version of huge_ptep_get_and_clear. Is that correct? Or, am I missing something. If I am correct, then code like the following may not work: static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) { pte_t huge_pte = huge_ptep_get(pte); struct numa_maps *md; struct page *page; if (!pte_present(huge_pte)) return 0; page = pte_page(huge_pte); md = walk->private; gather_stats(page, md, pte_dirty(huge_pte), 1); return 0; } -- Mike Kravetz