On Thu, Jan 22, 2026 at 12:28:32PM -0700, Nico Pache wrote: > The current mechanism for determining mTHP collapse scales the > khugepaged_max_ptes_none value based on the target order. This > introduces an undesirable feedback loop, or "creep", when max_ptes_none > is set to a value greater than HPAGE_PMD_NR / 2. > > With this configuration, a successful collapse to order N will populate > enough pages to satisfy the collapse condition on order N+1 on the next > scan. This leads to unnecessary work and memory churn. > > To fix this issue introduce a helper function that will limit mTHP > collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1. > This effectively supports two modes: > > - max_ptes_none=0: never introduce new none-pages for mTHP collapse. > - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest > available mTHP order. > > This removes the possiblilty of "creep", while not modifying any uAPI > expectations. A warning will be emitted if any non-supported > max_ptes_none value is configured with mTHP enabled. > > The limits can be ignored by passing full_scan=true, this is useful for > madvise_collapse (which ignores limits), or in the case of > collapse_scan_pmd(), allows the full PMD to be scanned when mTHP > collapse is available.
Thanks, great commit msg! > > Reviewed-by: Baolin Wang <[email protected]> > Signed-off-by: Nico Pache <[email protected]> This LGTM in terms of logic, some nits below, with those addressed feel free to add: Reviewed-by: Lorenzo Stoakes <[email protected]> Cheers, Lorenzo > --- > mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 42 insertions(+), 1 deletion(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 0f68902edd9a..9b7e05827749 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -460,6 +460,44 @@ void __khugepaged_enter(struct mm_struct *mm) > wake_up_interruptible(&khugepaged_wait); > } > > +/** > + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse > + * @order: The folio order being collapsed to > + * @full_scan: Whether this is a full scan (ignore limits) > + * > + * For madvise-triggered collapses (full_scan=true), all limits are bypassed > + * and allow up to HPAGE_PMD_NR - 1 empty PTEs. > + * > + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured > + * khugepaged_max_ptes_none value. > + * > + * For mTHP collapses, we currently only support khugepaged_max_pte_none > values > + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no > mTHP > + * collapse will be attempted > + * > + * Return: Maximum number of empty PTEs allowed for the collapse operation > + */ > +static unsigned int collapse_max_ptes_none(unsigned int order, bool > full_scan) > +{ > + /* ignore max_ptes_none limits */ > + if (full_scan) > + return HPAGE_PMD_NR - 1; I wonder if, given we are effectively doing: const unsigned int nr_pages = collapse_max_ptes_none(order, /*full_scan=*/true); ... foo(nr_pages); In places where we ignore limits, whether we would be better off putting HPAGE_PMD_NR - 1 into a define and just using that in these cases, like: #define COLLAPSE_MAX_PTES_LIM (HPAGE_PMD_NR - 1) Then instead doing: foo(COLLAPSE_MAX_PTES_LIM); ? Seems somewhat silly to pass in a boolean that makes it return a set value in cases where you know that should be the case at the outset. > + > + if (is_pmd_order(order)) > + return khugepaged_max_ptes_none; > + > + /* Zero/non-present collapse disabled. */ > + if (!khugepaged_max_ptes_none) > + return 0; > + > + if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1) Having a define for HPAGE_PMD_NR - 1 would also be handy here... > + return (1 << order) - 1; > + > + pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or > %d\n", > + HPAGE_PMD_NR - 1); ...and here. Also a MICRO nit here - the function returns unsigned int and thus we express PTEs in this unit, so maybe use %u rather than %d? > + return -EINVAL; > +} Logic of this function looks correct though! > + > void khugepaged_enter_vma(struct vm_area_struct *vma, > vm_flags_t vm_flags) > { > @@ -548,7 +586,10 @@ static enum scan_result > __collapse_huge_page_isolate(struct vm_area_struct *vma, > int none_or_zero = 0, shared = 0, referenced = 0; > enum scan_result result = SCAN_FAIL; > const unsigned long nr_pages = 1UL << order; > - int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - > order); > + int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged); Yeah, the !cc->is_khugepaged is a bit gross here, so as per the above, maybe do: int max_ptes_none; if (cc->is_khugepaged) max_ptes_none = collapse_max_ptes_none(order); else /* MADV_COLLAPSE is not limited. */ max_ptes_none = COLLAPSE_MAX_PTES_LIM; > + > + if (max_ptes_none == -EINVAL) > + return result; > > for (_pte = pte; _pte < pte + nr_pages; > _pte++, addr += PAGE_SIZE) { > -- > 2.52.0 >
