On 6/5/26 18:14, Nico Pache wrote: > Enable khugepaged to collapse to mTHP orders. This patch implements the > main scanning logic using a bitmap to track occupied pages and the > algorithm to find optimal collapse sizes. > > Previous to this patch, PMD collapse had 3 main phases, a light weight > scanning phase (mmap_read_lock) that determines a potential PMD > collapse, an alloc phase (mmap unlocked), then finally heavier collapse > phase (mmap_write_lock). > > To enabled mTHP collapse we make the following changes: > > During PMD scan phase, track occupied pages in a bitmap. When mTHP > orders are enabled, we remove the restriction of max_ptes_none during the > scan phase to avoid missing potential mTHP collapse candidates. Once we > have scanned the full PMD range and updated the bitmap to track occupied > pages, we use the bitmap to find the optimal mTHP size. > > Implement mthp_collapse() to walk forward through the bitmap and > determine the best eligible order for each naturally-aligned region. The > algorithm starts at the beginning of the PMD range and, for each offset, > tries the highest order that fits the alignment. If the number of > occupied PTEs in that region satisfies the max_ptes_none threshold for > that order, a collapse is attempted. On failure, the order is > decremented and the same offset is retried at the next smaller size. Once > the smallest enabled order is exhausted (or a collapse succeeds), the > offset advances past the region just processed, and the next attempt > starts at the highest order permitted by the new offset's natural > alignment. > > The algorithm works as follows: > 1) set offset=0 and order=HPAGE_PMD_ORDER > 2) if the order is not enabled, go to step (5) > 3) count occupied PTEs in the (offset, order) range using > bitmap_weight_from() > 4) if the count satisfies the max_ptes_none threshold, attempt > collapse; on success, advance to step (6) > 5) if a smaller enabled order exists, decrement order and retry > from step (2) at the same offset > 6) advance offset past the current region and compute the next > order from the new offset's natural alignment via __ffs(offset), > capped at HPAGE_PMD_ORDER > 7) repeat from step (2) until the full PMD range is covered > > mTHP collapses reject regions containing swapped out or shared pages. > This is because adding new entries can lead to new none pages, and these > may lead to constant promotion into a higher order mTHP. A similar > issue can occur with "max_ptes_none > HPAGE_PMD_NR/2" due to a collapse > introducing at least 2x the number of pages, and on a future scan will > satisfy the promotion condition once again. This issue is prevented via > the collapse_max_ptes_none() function which imposes the max_ptes_none > restrictions above. > > We currently only support mTHP collapse for max_ptes_none values of 0 > and HPAGE_PMD_NR - 1. resulting in the following behavior: > > - max_ptes_none=0: Never introduce new empty pages during collapse > - max_ptes_none=HPAGE_PMD_NR-1: Always try collapse to the highest > available mTHP order > > Any other max_ptes_none value will emit a warning and default mTHP > collapse to max_ptes_none=0. There should be no behavior change for PMD > collapse. > > Once we determine what mTHP sizes fits best in that PMD range a collapse > is attempted. A minimum collapse order of 2 is used as this is the lowest > order supported by anon memory as defined by THP_ORDERS_ALL_ANON. > > Currently madv_collapse is not supported and will only attempt PMD > collapse. > > We can also remove the check for is_khugepaged inside the PMD scan as > the collapse_max_ptes_none() function handles this logic now. > > Signed-off-by: Nico Pache <[email protected]> > ---
Yeah, overall much simpler and much easier to get. As discussed, we can optimize this later to traverse enabled orders more efficiently. > + bitmap_zero(cc->mthp_present_ptes, MAX_PTRS_PER_PTE); > memset(cc->node_load, 0, sizeof(cc->node_load)); > nodes_clear(cc->alloc_nmask); > + > + enabled_orders = collapse_possible_orders(vma, vma->vm_flags, > tva_flags); > + > + /* > + * If PMD is the only enabled order, enforce max_ptes_none, otherwise > + * scan all pages to populate the bitmap for mTHP collapse. > + */ I think it would have been good to mention where the check is performed for mTHP collapse. Can be added later. Acked-by: David Hildenbrand (Arm) <[email protected]> -- Cheers, David
