On 6/5/26 18:14, Nico Pache wrote:
> Enable khugepaged to collapse to mTHP orders. This patch implements the
> main scanning logic using a bitmap to track occupied pages and the
> algorithm to find optimal collapse sizes.
> 
> Previous to this patch, PMD collapse had 3 main phases, a light weight
> scanning phase (mmap_read_lock) that determines a potential PMD
> collapse, an alloc phase (mmap unlocked), then finally heavier collapse
> phase (mmap_write_lock).
> 
> To enabled mTHP collapse we make the following changes:
> 
> During PMD scan phase, track occupied pages in a bitmap. When mTHP
> orders are enabled, we remove the restriction of max_ptes_none during the
> scan phase to avoid missing potential mTHP collapse candidates. Once we
> have scanned the full PMD range and updated the bitmap to track occupied
> pages, we use the bitmap to find the optimal mTHP size.
> 
> Implement mthp_collapse() to walk forward through the bitmap and
> determine the best eligible order for each naturally-aligned region. The
> algorithm starts at the beginning of the PMD range and, for each offset,
> tries the highest order that fits the alignment. If the number of
> occupied PTEs in that region satisfies the max_ptes_none threshold for
> that order, a collapse is attempted. On failure, the order is
> decremented and the same offset is retried at the next smaller size. Once
> the smallest enabled order is exhausted (or a collapse succeeds), the
> offset advances past the region just processed, and the next attempt
> starts at the highest order permitted by the new offset's natural
> alignment.
> 
> The algorithm works as follows:
>     1) set offset=0 and order=HPAGE_PMD_ORDER
>     2) if the order is not enabled, go to step (5)
>     3) count occupied PTEs in the (offset, order) range using
>        bitmap_weight_from()
>     4) if the count satisfies the max_ptes_none threshold, attempt
>        collapse; on success, advance to step (6)
>     5) if a smaller enabled order exists, decrement order and retry
>        from step (2) at the same offset
>     6) advance offset past the current region and compute the next
>        order from the new offset's natural alignment via __ffs(offset),
>        capped at HPAGE_PMD_ORDER
>     7) repeat from step (2) until the full PMD range is covered
> 
> mTHP collapses reject regions containing swapped out or shared pages.
> This is because adding new entries can lead to new none pages, and these
> may lead to constant promotion into a higher order mTHP. A similar
> issue can occur with "max_ptes_none > HPAGE_PMD_NR/2" due to a collapse
> introducing at least 2x the number of pages, and on a future scan will
> satisfy the promotion condition once again. This issue is prevented via
> the collapse_max_ptes_none() function which imposes the max_ptes_none
> restrictions above.
> 
> We currently only support mTHP collapse for max_ptes_none values of 0
> and HPAGE_PMD_NR - 1. resulting in the following behavior:
> 
>     - max_ptes_none=0: Never introduce new empty pages during collapse
>     - max_ptes_none=HPAGE_PMD_NR-1: Always try collapse to the highest
>       available mTHP order
> 
> Any other max_ptes_none value will emit a warning and default mTHP
> collapse to max_ptes_none=0. There should be no behavior change for PMD
> collapse.
> 
> Once we determine what mTHP sizes fits best in that PMD range a collapse
> is attempted. A minimum collapse order of 2 is used as this is the lowest
> order supported by anon memory as defined by THP_ORDERS_ALL_ANON.
> 
> Currently madv_collapse is not supported and will only attempt PMD
> collapse.
> 
> We can also remove the check for is_khugepaged inside the PMD scan as
> the collapse_max_ptes_none() function handles this logic now.
> 
> Signed-off-by: Nico Pache <[email protected]>
> ---

Yeah, overall much simpler and much easier to get. As discussed, we can optimize
this later to traverse enabled orders more efficiently.

> +     bitmap_zero(cc->mthp_present_ptes, MAX_PTRS_PER_PTE);
>       memset(cc->node_load, 0, sizeof(cc->node_load));
>       nodes_clear(cc->alloc_nmask);
> +
> +     enabled_orders = collapse_possible_orders(vma, vma->vm_flags, 
> tva_flags);
> +
> +     /*
> +      * If PMD is the only enabled order, enforce max_ptes_none, otherwise
> +      * scan all pages to populate the bitmap for mTHP collapse.
> +      */

I think it would have been good to mention where the check is performed for mTHP
collapse. Can be added later.


Acked-by: David Hildenbrand (Arm) <[email protected]>

-- 
Cheers,

David

Reply via email to