On Fri, May 22, 2026 at 8:59 AM Nico Pache <[email protected]> wrote: > > The following series provides khugepaged with the capability to collapse > anonymous memory regions to mTHPs. > > To achieve this we generalize the khugepaged functions to no longer depend > on PMD_ORDER. Then during the PMD scan, we use a bitmap to track individual > pages that are occupied (!none/zero). After the PMD scan is done, we use > the bitmap to find the optimal mTHP sizes for the PMD range. The > restriction on max_ptes_none is removed during the scan, to make sure we > account for the whole PMD range in the bitmap. When no mTHP size is > enabled, the legacy behavior of khugepaged is maintained. > > We currently only support max_ptes_none values of 0 or HPAGE_PMD_NR - 1 > (ie 511). If any other value is specified, the kernel will emit a warning > and mTHP collapse will default to max_ptes_none=0. If a mTHP collapse is > attempted, but contains swapped out, or shared pages, we don't perform > the collapse. > It is now also possible to collapse to mTHPs without requiring the PMD THP > size to be enabled. These limitations are to prevent collapse "creep" > behavior. This prevents constantly promoting mTHPs to the next available > size, which would occur because a collapse introduces more non-zero pages > that would satisfy the promotion condition on subsequent scans. > > Patch 1-2: Generalize hugepage_vma_revalidate and alloc_charge_folio > for arbitrary orders. > Patch 3: Rework max_ptes_* handling into helper functions > Patch 4: Generalize __collapse_huge_page_* for mTHP support > Patch 5: Require collapse_huge_page to enter/exit with the lock dropped > Patch 6: Generalize collapse_huge_page for mTHP collapse > Patch 7: Skip collapsing mTHP to smaller orders > Patch 8-9: Add per-order mTHP statistics and tracepoints > Patch 10: Introduce collapse_allowable_orders helper function > Patch 11-13: Introduce bitmap and mTHP collapse support, fully enabled > Patch 14: Documentation > > Testing: > - Built for x86_64, aarch64, ppc64le, and s390x > - ran all arches on test suites provided by the kernel-tests project > - internal testing suites: functional testing and performance testing > - selftests mm > - I created a test script that I used to push khugepaged to its limits > while monitoring a number of stats and tracepoints. The code is > available here[1] (Run in legacy mode for these changes and set mthp > sizes to inherit) > The summary from my testings was that there was no significant > regression noticed through this test. In some cases my changes had > better collapse latencies, and was able to scan more pages in the same > amount of time/work, but for the most part the results were consistent. > - redis testing. I did some testing with these changes along with my defer > changes (see followup [2] post for more details). We've decided to get > the mTHP changes merged first before attempting the defer series. > - some basic testing on 64k page size. > - lots of general use. > > [1] - https://gitlab.com/npache/khugepaged_mthp_test > [2] - https://lore.kernel.org/lkml/[email protected]/ > > V18 Changes: > - Added RBs/Acks > - [patch 02] Guard count_memcg_folio_events with is_pmd_order() to keep > THP_COLLAPSE_ALLOC PMD-only (Usama, Lance) > - [patch 03] Convert C++ comments to C-style; fix "none-page" terminology > to "empty PTEs or PTEs mapping the shared zeropage"; drop unnecessary > userfaultfd comment; add const to local max_ptes_* variables; fix > "repect" typo (Lance, David) > - [patch 04] collapse_max_ptes_none() now returns 0 instead of -EINVAL for > unsupported values; remove SCAN_INVALID_PTES_NONE; change return type > from int to unsigned int and propagate to all callers; add comment above > __collapse_huge_page_swapin explaining mTHP swap bail-out (David, > Lorenzo, Lance, Wei Yang, Usama) > - [patch 05] Rewrite collapse_huge_page lock comment to David's suggested > wording (David) > - [patch 11] Propagate unsigned int return type for max_ptes_none; remove > the now-unnecessary negative return check (consequence of patch 04); > Add optimization to the next_order goto that will prevent unnecessary > iterations if there are no lower orders enabled (Vernon); update locking > comment; pass VMA to mthp_collapse to improve uffd-armed detection, and > prevent unnecessary work. (Wei) > - [patch 14] Update documentation to reflect fallback-to-0 behavior > > V17: https://lore.kernel.org/all/[email protected] > V16: https://lore.kernel.org/all/[email protected] > V15: https://lore.kernel.org/all/[email protected] > V14: https://lore.kernel.org/all/[email protected] > V13: https://lore.kernel.org/all/[email protected] > V12: https://lore.kernel.org/all/[email protected] > V11: https://lore.kernel.org/all/[email protected] > V10: https://lore.kernel.org/all/[email protected] > V9 : https://lore.kernel.org/all/[email protected] > V8 : https://lore.kernel.org/all/[email protected] > V7 : https://lore.kernel.org/all/[email protected] > V6 : https://lore.kernel.org/all/[email protected] > V5 : https://lore.kernel.org/all/[email protected] > V4 : https://lore.kernel.org/all/[email protected] > V3 : https://lore.kernel.org/all/[email protected] > V2 : https://lore.kernel.org/all/[email protected] > V1 : https://lore.kernel.org/all/[email protected] > > Baolin Wang (1): > mm/khugepaged: run khugepaged for all orders > > Dev Jain (1): > mm/khugepaged: generalize alloc_charge_folio() > > Nico Pache (12): > mm/khugepaged: generalize hugepage_vma_revalidate for mTHP support > mm/khugepaged: rework max_ptes_* handling with helper functions > mm/khugepaged: generalize __collapse_huge_page_* for mTHP support > mm/khugepaged: require collapse_huge_page to enter/exit with the lock > dropped > mm/khugepaged: generalize collapse_huge_page for mTHP collapse > mm/khugepaged: skip collapsing mTHP to smaller orders > mm/khugepaged: add per-order mTHP collapse failure statistics > mm/khugepaged: improve tracepoints for mTHP orders > mm/khugepaged: introduce collapse_allowable_orders helper function > mm/khugepaged: Introduce mTHP collapse support > mm/khugepaged: avoid unnecessary mTHP collapse attempts > Documentation: mm: update the admin guide for mTHP collapse > > Documentation/admin-guide/mm/transhuge.rst | 72 ++- > include/linux/huge_mm.h | 5 + > include/trace/events/huge_memory.h | 34 +- > mm/huge_memory.c | 11 + > mm/khugepaged.c | 634 ++++++++++++++++----- > 5 files changed, 584 insertions(+), 172 deletions(-) > > > base-commit: 6c8cb505a5634594b3ea159fd1c71bce2acf3346
Whoops I manually changed the coverletter subject to reflect that this in on mm-hotfixes-unstable but never updated the others... Hopefully that is ok. Just a small mistake. Base commit is referenced here. -- Nico > -- > 2.54.0 >
