On 2025/12/2 01:46, Nico Pache wrote:
The current mechanism for determining mTHP collapse scales the
khugepaged_max_ptes_none value based on the target order. This
introduces an undesirable feedback loop, or "creep", when max_ptes_none
is set to a value greater than HPAGE_PMD_NR / 2.

With this configuration, a successful collapse to order N will populate
enough pages to satisfy the collapse condition on order N+1 on the next
scan. This leads to unnecessary work and memory churn.

To fix this issue introduce a helper function that will limit mTHP
collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
This effectively supports two modes:

- max_ptes_none=0: never introduce new none-pages for mTHP collapse.
- max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
   available mTHP order.

This removes the possiblilty of "creep", while not modifying any uAPI
expectations. A warning will be emitted if any non-supported
max_ptes_none value is configured with mTHP enabled.

The limits can be ignored by passing full_scan=true, this is useful for
madvise_collapse (which ignores limits), or in the case of
collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
collapse is available.

Signed-off-by: Nico Pache <[email protected]>
---
  mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
  1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8dab49c53128..f425238d5d4f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
                wake_up_interruptible(&khugepaged_wait);
  }
+/**
+ * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
+ * @order: The folio order being collapsed to
+ * @full_scan: Whether this is a full scan (ignore limits)
+ *
+ * For madvise-triggered collapses (full_scan=true), all limits are bypassed
+ * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
+ *
+ * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
+ * khugepaged_max_ptes_none value.
+ *
+ * For mTHP collapses, we currently only support khugepaged_max_pte_none values
+ * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
+ * collapse will be attempted
+ *
+ * Return: Maximum number of empty PTEs allowed for the collapse operation
+ */
+static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
+{
+       /* ignore max_ptes_none limits */
+       if (full_scan)
+               return HPAGE_PMD_NR - 1;
+
+       if (!is_mthp_order(order))
+               return khugepaged_max_ptes_none;
+
+       /* Zero/non-present collapse disabled. */
+       if (!khugepaged_max_ptes_none)
+               return 0;
+
+       if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
+               return (1 << order) - 1;
+
+       pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or 
%d\n",
+                     HPAGE_PMD_NR - 1);
+       return -EINVAL;
+}

Thanks. That aligns with what we talked about previously. So
Reviewed-by: Baolin Wang <[email protected]>

Reply via email to