On Tue, Jun 9, 2026 at 3:26 AM David Hildenbrand (Arm) <[email protected]> wrote:
>
> On 6/9/26 11:06, Nico Pache wrote:
> > On Mon, Jun 8, 2026 at 8:57 AM David Hildenbrand (Arm) <[email protected]> 
> > wrote:
> >>
> >> On 6/6/26 12:28, Lance Yang wrote:
> >>>
> >>>
> >>> Looks broken for swap PTEs in PMD collapse ...
> >>>
> >>> collapse_scan_pmd() allows them up to max_ptes_swap and record them in
> >>> unmapped, but they don't get a bit in mthp_present_ptes. And then
> >>> mthp_collapse() does the check above:
> >>
> >> Right. I assumed this is implicitly handled by the optimization in 
> >> collapse_scan_pmd:
> >>
> >>         if (enabled_orders != BIT(HPAGE_PMD_ORDER))
> >>                 max_ptes_none = KHUGEPAGED_MAX_PTES_LIMIT;
> >>
> >> But we perform the check a second time.
> >>
> >>>
> >>> nr_occupied_ptes >= nr_ptes - max_ptes_none
> >>>
> >>> So max_ptes_none=0 + 511 present PTEs + one allowed swap PTE won't even
> >>> call collapse_huge_page() for PMD order.
> >>>
> >>> Shouldn't we account for them in the PMD-order check? Something like:
> >>>
> >>> if (is_pmd_order(order))
> >>>       nr_occupied_ptes += unmapped;
> >
> > This solution seems good for a temporary fixup. but longterm we may
> > want something else. I'm still not sure how we plan on supporting
> > swapin without causing creep. So I'd be ok with adding a fix for
> > legacy PMD behavior until we know how to handle mTHP creep correctly.
> >
> >> As an alternative, we could either 1) skip the check there for
> >> pmd order (as the check was already done); or 2) introduce+maintain
> >> a bitmap that tracks non-present PTEs.
> >>
> >> @@ -1475,7 +1477,9 @@ static enum scan_result mthp_collapse(struct 
> >> mm_struct *mm,
> >>                 nr_occupied_ptes = 
> >> bitmap_weight_from(cc->mthp_present_ptes, offset,
> >>                                                       offset + nr_ptes);
> >>
> >> -               if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
> >> +               /* Check was already done in the caller. */
> >> +               if (is_pmd_order(order) ||
> >> +                   nr_occupied_ptes >= nr_ptes - max_ptes_none) {
> >>                         enum scan_result ret;
> >>
> >>                         collapse_address = address + offset * PAGE_SIZE;
> >>
> >> 2) would probably be cleanest long-term.
> >
> > That would be best for future swapin support in mTHP, but I still
> > don't think it solves the creep issue.
>
> It wouldn't, we'd simply maintain the state we collect + rely on in separate
> bitmaps. On swapin, we'd have to update/refresh bitmaps I guess.

Yeah, I'm saying for the future, it obviously solves this issue here
as well, but if we have positional tracking of the swapout, shared,
and none PTEs, I think we can use this to determine whether the
collapse would lead to creep. If we detect creep would happen it may
be best to automatically collapse to the N+1 (or greater) candidate.
Just thinking outloud here.

>
> > Perhaps we could combine the
> > two bitmaps to determine if it would make the future collapse eligible
> > again? Not sure but ill start thinking about it.
> >
> > Should I send a fixup for this using Lance's solution? Or does Lance
> > want to send a patch out with the fixes tag?
>
> If Lance could send a fixup, explaining the situation, that would be nice.

OK, I'd appreciate that :)

Cheers,
-- Nico

>
> --
> Cheers,
>
> David
>


Reply via email to