* Nico Pache <npa...@redhat.com> [250713 20:33]: > The khugepaged daemon and madvise_collapse have two different > implementations that do almost the same thing. > > Create collapse_single_pmd to increase code reuse and create an entry > point to these two users. > > Refactor madvise_collapse and collapse_scan_mm_slot to use the new > collapse_single_pmd function. This introduces a minor behavioral change > that is most likely an undiscovered bug. The current implementation of > khugepaged tests collapse_test_exit_or_disable before calling > collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse > case. By unifying these two callers madvise_collapse now also performs > this check. > > Reviewed-by: Baolin Wang <baolin.w...@linux.alibaba.com> > Signed-off-by: Nico Pache <npa...@redhat.com> > --- > mm/khugepaged.c | 95 +++++++++++++++++++++++++------------------------ > 1 file changed, 49 insertions(+), 46 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index eb0babb51868..47a80638af97 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2362,6 +2362,50 @@ static int collapse_scan_file(struct mm_struct *mm, > unsigned long addr, > return result; > } > > +/* > + * Try to collapse a single PMD starting at a PMD aligned addr, and return > + * the results. > + */ > +static int collapse_single_pmd(unsigned long addr, > + struct vm_area_struct *vma, bool > *mmap_locked, > + struct collapse_control *cc) > +{ > + int result = SCAN_FAIL; > + struct mm_struct *mm = vma->vm_mm; > + > + if (!vma_is_anonymous(vma)) { > + struct file *file = get_file(vma->vm_file); > + pgoff_t pgoff = linear_page_index(vma, addr); > + > + mmap_read_unlock(mm); > + *mmap_locked = false;
Okay, just for my sanity, when we reach this part.. mmap_locked will be false on return. Because we set it a bunch more below.. but it's always false on return. Although this is cleaner implementation of the lock, I'm just not sure why you keep flipping the mmap_locked variable here? We could probably get away with comments that it will always be false. > + result = collapse_scan_file(mm, addr, file, pgoff, cc); > + fput(file); > + if (result == SCAN_PTE_MAPPED_HUGEPAGE) { > + mmap_read_lock(mm); > + *mmap_locked = true; > + if (collapse_test_exit_or_disable(mm)) { > + mmap_read_unlock(mm); > + *mmap_locked = false; > + result = SCAN_ANY_PROCESS; > + goto end; > + } > + result = collapse_pte_mapped_thp(mm, addr, > + !cc->is_khugepaged); > + if (result == SCAN_PMD_MAPPED) > + result = SCAN_SUCCEED; > + mmap_read_unlock(mm); > + *mmap_locked = false; > + } > + } else { > + result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc); > + } > + if (cc->is_khugepaged && result == SCAN_SUCCEED) > + ++khugepaged_pages_collapsed; > +end: > + return result; > +} > + > static unsigned int collapse_scan_mm_slot(unsigned int pages, int *result, > struct collapse_control *cc) > __releases(&khugepaged_mm_lock) > @@ -2436,34 +2480,9 @@ static unsigned int collapse_scan_mm_slot(unsigned int > pages, int *result, > VM_BUG_ON(khugepaged_scan.address < hstart || > khugepaged_scan.address + HPAGE_PMD_SIZE > > hend); > - if (!vma_is_anonymous(vma)) { > - struct file *file = get_file(vma->vm_file); > - pgoff_t pgoff = linear_page_index(vma, > - khugepaged_scan.address); > - > - mmap_read_unlock(mm); > - mmap_locked = false; > - *result = hpage_collapse_scan_file(mm, > - khugepaged_scan.address, file, pgoff, > cc); > - fput(file); > - if (*result == SCAN_PTE_MAPPED_HUGEPAGE) { > - mmap_read_lock(mm); > - if > (hpage_collapse_test_exit_or_disable(mm)) > - goto breakouterloop; > - *result = collapse_pte_mapped_thp(mm, > - khugepaged_scan.address, false); > - if (*result == SCAN_PMD_MAPPED) > - *result = SCAN_SUCCEED; > - mmap_read_unlock(mm); > - } > - } else { > - *result = hpage_collapse_scan_pmd(mm, vma, > - khugepaged_scan.address, &mmap_locked, > cc); > - } > - > - if (*result == SCAN_SUCCEED) > - ++khugepaged_pages_collapsed; > > + *result = collapse_single_pmd(khugepaged_scan.address, > + vma, &mmap_locked, cc); > /* move to next address */ > khugepaged_scan.address += HPAGE_PMD_SIZE; > progress += HPAGE_PMD_NR; > @@ -2780,35 +2799,19 @@ int madvise_collapse(struct vm_area_struct *vma, > unsigned long start, > mmap_assert_locked(mm); > memset(cc->node_load, 0, sizeof(cc->node_load)); > nodes_clear(cc->alloc_nmask); > - if (!vma_is_anonymous(vma)) { > - struct file *file = get_file(vma->vm_file); > - pgoff_t pgoff = linear_page_index(vma, addr); > > - mmap_read_unlock(mm); > - mmap_locked = false; > - result = hpage_collapse_scan_file(mm, addr, file, pgoff, > - cc); > - fput(file); > - } else { > - result = hpage_collapse_scan_pmd(mm, vma, addr, > - &mmap_locked, cc); > - } > + result = collapse_single_pmd(addr, vma, &mmap_locked, cc); > + > if (!mmap_locked) > *lock_dropped = true; All of this locking is scary, because there are comments everywhere that imply that mmap_locked indicates that the lock was dropped at some point, but we are using it to indicate that the lock is currently held - which are very different things.. Here, for example locked_dropped may not be set to true event though we have toggled it through collapse_single_pmd() -> collapse_scan_pmd() -> ... -> collapse_huge_page(). Maybe these scenarios are safe because of known limitations of what will or will not happen, but the code paths existing without a comment about why it is safe seems like a good way to introduce races later. > > -handle_result: > switch (result) { > case SCAN_SUCCEED: > case SCAN_PMD_MAPPED: > ++thps; > break; > - case SCAN_PTE_MAPPED_HUGEPAGE: > - BUG_ON(mmap_locked); > - mmap_read_lock(mm); > - result = collapse_pte_mapped_thp(mm, addr, true); > - mmap_read_unlock(mm); > - goto handle_result; > /* Whitelisted set of results where continuing OK */ > + case SCAN_PTE_MAPPED_HUGEPAGE: > case SCAN_PMD_NULL: > case SCAN_PTE_NON_PRESENT: > case SCAN_PTE_UFFD_WP: > -- > 2.50.0 > >