On 5/5/22 20:39, Baolin Wang wrote:
> 
> On 5/6/2022 7:53 AM, Mike Kravetz wrote:
>> On 4/29/22 01:14, Baolin Wang wrote:
>>> On some architectures (like ARM64), it can support CONT-PTE/PMD size
>>> hugetlb, which means it can support not only PMD/PUD size hugetlb:
>>> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
>>> size specified.
>> <snip>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 6fdd198..7cf2408 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, 
>>> struct vm_area_struct *vma,
>>>                       break;
>>>                   }
>>>               }
>>> +
>>> +            /* Nuke the hugetlb page table entry */
>>> +            pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
>>>           } else {
>>>               flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>>> +            /* Nuke the page table entry. */
>>> +            pteval = ptep_clear_flush(vma, address, pvmw.pte);
>>>           }
>>>   
>>
>> On arm64 with CONT-PTE/PMD the returned pteval will have dirty or young set
>> if ANY of the PTE/PMDs had dirty or young set.
> 
> Right.
> 
>>
>>> -        /* Nuke the page table entry. */
>>> -        pteval = ptep_clear_flush(vma, address, pvmw.pte);
>>> -
>>>           /* Set the dirty flag on the folio now the pte is gone. */
>>>           if (pte_dirty(pteval))
>>>               folio_mark_dirty(folio);
>>> @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, 
>>> struct vm_area_struct *vma,
>>>               pte_t swp_pte;
>>>                 if (arch_unmap_one(mm, vma, address, pteval) < 0) {
>>> -                set_pte_at(mm, address, pvmw.pte, pteval);
>>> +                if (folio_test_hugetlb(folio))
>>> +                    set_huge_pte_at(mm, address, pvmw.pte, pteval);
>>
>> And, we will use that pteval for ALL the PTE/PMDs here.  So, we would set
>> the dirty or young bit in ALL PTE/PMDs.
>>
>> Could that cause any issues?  May be more of a question for the arm64 people.
> 
> I don't think this will cause any issues. Since the hugetlb can not be split, 
> and we should not lose the the dirty or young state if any subpages were set. 
> Meanwhile we already did like this in hugetlb.c:
> 
> pte = huge_ptep_get_and_clear(mm, address, ptep);
> tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
> if (huge_pte_dirty(pte))
>     set_page_dirty(page);
> 

Agree that it 'should not' cause issues.  It just seems inconsistent.
This is not a problem specifically with your patch, just the handling of
CONT-PTE/PMD entries.

There does not appear to be an arm64 specific version of huge_ptep_get()
that takes CONT-PTE/PMD into account.  So, huge_ptep_get() would only
return the one specific value.  It would not take into account the dirty
or young bits of CONT-PTE/PMDs like your new version of
huge_ptep_get_and_clear.  Is that correct?  Or, am I missing something.

If I am correct, then code like the following may not work:

static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
                unsigned long addr, unsigned long end, struct mm_walk *walk)
{
        pte_t huge_pte = huge_ptep_get(pte);
        struct numa_maps *md;
        struct page *page;

        if (!pte_present(huge_pte))
                return 0;

        page = pte_page(huge_pte);

        md = walk->private;
        gather_stats(page, md, pte_dirty(huge_pte), 1);
        return 0;
}

-- 
Mike Kravetz

Reply via email to