On 21/08/25 8:31 pm, Lorenzo Stoakes wrote:
OK so I noticed in patch 13/13 (!) where you change the documentation that you
essentially state that the whole method used to determine the ratio of PTEs to
collapse to mTHP is broken:

        khugepaged uses max_ptes_none scaled to the order of the enabled
        mTHP size to determine collapses. When using mTHPs it's recommended
        to set max_ptes_none low-- ideally less than HPAGE_PMD_NR / 2 (255
        on 4k page size). This will prevent undesired "creep" behavior that
        leads to continuously collapsing to the largest mTHP size; when we
        collapse, we are bringing in new non-zero pages that will, on a
        subsequent scan, cause the max_ptes_none check of the +1 order to
        always be satisfied. By limiting this to less than half the current
        order, we make sure we don't cause this feedback
        loop. max_ptes_shared and max_ptes_swap have no effect when
        collapsing to a mTHP, and mTHP collapse will fail on shared or
        swapped out pages.

This seems to me to suggest that using
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none as some means
of establishing a 'ratio' to do this calculation is fundamentally flawed.

So surely we ought to introduce a new sysfs tunable for this? Perhaps

/sys/kernel/mm/transparent_hugepage/khugepaged/mthp_max_ptes_none_ratio

Or something like this?

It's already questionable that we are taking a value that is expressed
essentially in terms of PTE entries per PMD and then use it implicitly to
determine the ratio for mTHP, but to then say 'oh but the default value is
known-broken' is just a blocker for the series in my opinion.

This really has to be done a different way I think.

Cheers, Lorenzo

FWIW this was my version of the documentation patch:
https://lore.kernel.org/all/[email protected]/

The discussion about the creep problem started here:
https://lore.kernel.org/all/[email protected]/

and the discussion continuing here:
https://lore.kernel.org/all/[email protected]/

ending with a summary I gave here:
https://lore.kernel.org/all/[email protected]/

This should help you with the context.



Reply via email to