On Mon, May 11, 2026 at 12:58:00PM -0600, Nico Pache wrote:
>The following series provides khugepaged with the capability to collapse
>anonymous memory regions to mTHPs.
>
>To achieve this we generalize the khugepaged functions to no longer depend
>on PMD_ORDER. Then during the PMD scan, we use a bitmap to track individual
>pages that are occupied (!none/zero). After the PMD scan is done, we use
>the bitmap to find the optimal mTHP sizes for the PMD range. The
>restriction on max_ptes_none is removed during the scan, to make sure we
>account for the whole PMD range in the bitmap. When no mTHP size is
>enabled, the legacy behavior of khugepaged is maintained.
>
>We currently only support max_ptes_none values of 0 or HPAGE_PMD_NR - 1
>(ie 511). If any other value is specified, the kernel will emit a warning
>and no mTHP collapse will be attempted. If a mTHP collapse is attempted,
>but contains swapped out, or shared pages, we don't perform the collapse.
>It is now also possible to collapse to mTHPs without requiring the PMD THP
>size to be enabled. These limitations are to prevent collapse "creep"
>behavior. This prevents constantly promoting mTHPs to the next available
>size, which would occur because a collapse introduces more non-zero pages
>that would satisfy the promotion condition on subsequent scans.
>
>Patch 1-2:   Generalize hugepage_vma_revalidate and alloc_charge_folio
>            for arbitrary orders.
>Patch 3:     Rework max_ptes_* handling into helper functions
>Patch 4:     Generalize __collapse_huge_page_* for mTHP support
>Patch 5:     Require collapse_huge_page to enter/exit with the lock dropped
>Patch 6:     Generalize collapse_huge_page for mTHP collapse
>Patch 7:     Skip collapsing mTHP to smaller orders
>Patch 8-9:   Add per-order mTHP statistics and tracepoints
>Patch 10:    Introduce collapse_allowable_orders helper function
>Patch 11-13: Introduce bitmap and mTHP collapse support, fully enabled
>Patch 14:    Documentation
>
>Testing:
>- Built for x86_64, aarch64, ppc64le, and s390x
>- ran all arches on test suites provided by the kernel-tests project
>- internal testing suites: functional testing and performance testing
>- selftests mm
>- I created a test script that I used to push khugepaged to its limits
>   while monitoring a number of stats and tracepoints. The code is
>   available here[1] (Run in legacy mode for these changes and set mthp
>   sizes to inherit)
>   The summary from my testings was that there was no significant
>   regression noticed through this test. In some cases my changes had
>   better collapse latencies, and was able to scan more pages in the same
>   amount of time/work, but for the most part the results were consistent.
>- redis testing. I did some testing with these changes along with my defer
>  changes (see followup [2] post for more details). We've decided to get
>  the mTHP changes merged first before attempting the defer series.
>- some basic testing on 64k page size.
>- lots of general use.
>

Two links are missing. I got them from previous version.

[1] - https://gitlab.com/npache/khugepaged_mthp_test
[2] - https://lore.kernel.org/lkml/[email protected]/

And the test in [1] is a performance test. I am thinking whether we want a
functional test in selftests.

I did a quick try with following change and some hack.

@@ -744,6 +765,51 @@ static void collapse_max_ptes_none(struct collapse_context 
*c, struct mem_ops *o
        ksft_test_result_report(exit_status, "%s\n", __func__);
 }
 
+static void collapse_mth_ptes(struct collapse_context *c, struct mem_ops *ops)
+{
+       struct thp_settings settings = *thp_current_settings();
+       void *p;
+       int i;
+
+       /* Disable mthp on fault */
+       for (i = 0; i < NR_ORDERS; i++) {
+               settings.hugepages[i].enabled = THP_NEVER;
+       }
+       thp_push_settings(&settings);
+
+       p = ops->setup_area(1);
+
+       ops->fault(p, 0, hpage_pmd_size);
+
+       /* Expect all order-0 folio after fault */
+       memset(expected_orders, 0, sizeof(int) * (pmd_order + 1));
+       expected_orders[0] = hpage_pmd_nr;
+       if (check_folio_orders(p, hpage_pmd_size, pagemap_fd,
+                                          kpageflags_fd, expected_orders,
+                                          (pmd_order + 1)))
+               ksft_exit_fail_msg("Unexpected huge page at fault\n");
+
+       /* Enable mthp before collapse */
+       thp_pop_settings();
+       settings.hugepages[2].enabled = THP_ALWAYS;
+       thp_push_settings(&settings);
+
+       c->collapse("Collapse fully populated PTE table with order 2", p, 1,
+                   ops, true);
+
+       /* Expect all order-2 folio after collapse */
+       memset(expected_orders, 0, sizeof(int) * (pmd_order + 1));
+       expected_orders[2] = 1 << (pmd_order - 2);
+       if (check_folio_orders(p, hpage_pmd_size, pagemap_fd,
+                                          kpageflags_fd, expected_orders,
+                                          (pmd_order + 1)))
+               ksft_exit_fail_msg("Unexpected page order\n");
+
+       ops->cleanup_area(p, hpage_pmd_size);
+       thp_pop_settings();
+       ksft_test_result_report(exit_status, "%s\n", __func__);
+}
+
 static void collapse_swapin_single_pte(struct collapse_context *c, struct 
mem_ops *ops)
 {
        void *p;

This leverage check_after_split_folio_orders() in split_huge_page_test.c to
check folio order in PMD range.

-- 
Wei Yang
Help you, Help me

Reply via email to