On Tue, Dec 16, 2025 at 1:12 AM Baolin Wang <[email protected]> wrote: > > Hi Nico,
Hi Baolin! Thanks for testing :) Did you happen to test with the changes I asked Andrew to append to this commit? Either way, I think your fixup makes more sense than mine. Cheers, -- Nico > > On 2025/12/2 01:46, Nico Pache wrote: > > The current mechanism for determining mTHP collapse scales the > > khugepaged_max_ptes_none value based on the target order. This > > introduces an undesirable feedback loop, or "creep", when max_ptes_none > > is set to a value greater than HPAGE_PMD_NR / 2. > > > > With this configuration, a successful collapse to order N will populate > > enough pages to satisfy the collapse condition on order N+1 on the next > > scan. This leads to unnecessary work and memory churn. > > > > To fix this issue introduce a helper function that will limit mTHP > > collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1. > > This effectively supports two modes: > > > > - max_ptes_none=0: never introduce new none-pages for mTHP collapse. > > - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest > > available mTHP order. > > > > This removes the possiblilty of "creep", while not modifying any uAPI > > expectations. A warning will be emitted if any non-supported > > max_ptes_none value is configured with mTHP enabled. > > > > The limits can be ignored by passing full_scan=true, this is useful for > > madvise_collapse (which ignores limits), or in the case of > > collapse_scan_pmd(), allows the full PMD to be scanned when mTHP > > collapse is available. > > > > Signed-off-by: Nico Pache <[email protected]> > > --- > > mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 42 insertions(+), 1 deletion(-) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 8dab49c53128..f425238d5d4f 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm) > > wake_up_interruptible(&khugepaged_wait); > > } > > > > +/** > > + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for > > collapse > > + * @order: The folio order being collapsed to > > + * @full_scan: Whether this is a full scan (ignore limits) > > + * > > + * For madvise-triggered collapses (full_scan=true), all limits are > > bypassed > > + * and allow up to HPAGE_PMD_NR - 1 empty PTEs. > > + * > > + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured > > + * khugepaged_max_ptes_none value. > > + * > > + * For mTHP collapses, we currently only support khugepaged_max_pte_none > > values > > + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no > > mTHP > > + * collapse will be attempted > > + * > > + * Return: Maximum number of empty PTEs allowed for the collapse operation > > + */ > > +static unsigned int collapse_max_ptes_none(unsigned int order, bool > > full_scan) > > +{ > > + /* ignore max_ptes_none limits */ > > + if (full_scan) > > + return HPAGE_PMD_NR - 1; > > + > > + if (!is_mthp_order(order)) > > + return khugepaged_max_ptes_none; > > + > > + /* Zero/non-present collapse disabled. */ > > + if (!khugepaged_max_ptes_none) > > + return 0; > > + > > + if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1) > > + return (1 << order) - 1; > > + > > + pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 > > or %d\n", > > + HPAGE_PMD_NR - 1); > > + return -EINVAL; > > +} > > + > > void khugepaged_enter_vma(struct vm_area_struct *vma, > > vm_flags_t vm_flags) > > { > > @@ -550,7 +588,10 @@ static int __collapse_huge_page_isolate(struct > > vm_area_struct *vma, > > pte_t *_pte; > > int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0; > > const unsigned long nr_pages = 1UL << order; > > - int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - > > order); > > + int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged); > > + > > + if (max_ptes_none == -EINVAL) > > + goto out; > > After testing your patchset, I hit the following crash. The reason is > that when 'max_ptes_none' is -EINVAL here, it shouldn't goto out to call > release_pte_pages(), because the '_pte' hasn't been initialized at this > point, and there's no need to release folios either. > > After applying the fix below, the crash issue is resolved. I'm not sure > whether Andrew will help fix this or if you will send a new version to > address this issue. > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 8cffaf59ced8..2e8171a6d7df 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -646,7 +646,7 @@ static int __collapse_huge_page_isolate(struct > vm_area_struct *vma, > int max_ptes_none = collapse_max_ptes_none(order, > !cc->is_khugepaged); > > if (max_ptes_none == -EINVAL) > - goto out; > + return result; > > for (_pte = pte; _pte < pte + nr_pages; > _pte++, addr += PAGE_SIZE) { > > " > [ 565.319345] Unable to handle kernel paging request at virtual address > fffffffffffffffa > ....... > [ 565.319409] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000001f8549a000 > [ 565.319416] [fffffffffffffffa] pgd=0000001f85f2a403, > p4d=0000001f85f2a403, pud=0000001f85f2b403, pmd=0000000000000000 > [ 565.319427] Internal error: Oops: 0000000096000006 [#1] SMP > ....... > [ 565.326733] pc : release_pte_pages+0x68/0x178 > [ 565.326960] lr : __collapse_huge_page_isolate+0xc0/0x748 > [ 565.327232] sp : ffff800083593910 > ....... > [ 565.331476] Call trace: > [ 565.331664] release_pte_pages+0x68/0x178 (P) > [ 565.331940] __collapse_huge_page_isolate+0xc0/0x748 > [ 565.332249] collapse_huge_page+0x4cc/0xa70 > [ 565.332510] mthp_collapse+0x254/0x2a8 > [ 565.332754] collapse_scan_pmd+0x5a0/0x6d8 > [ 565.333010] collapse_single_pmd+0x214/0x288 > [ 565.333275] collapse_scan_mm_slot.constprop.0+0x2ac/0x460 > [ 565.333617] khugepaged+0x204/0x2c8 > [ 565.333992] kthread+0xf8/0x110 > [ 565.334368] ret_from_fork+0x10/0x20 > " > > > > > for (_pte = pte; _pte < pte + nr_pages; > > _pte++, addr += PAGE_SIZE) { >
