On 19.08.25 15:41, Nico Pache wrote:
The following series provides khugepaged with the capability to collapse
anonymous memory regions to mTHPs.
To achieve this we generalize the khugepaged functions to no longer depend
on PMD_ORDER. Then during the PMD scan, we use a bitmap to track chunks of
pages (defined by KHUGEPAGED_MTHP_MIN_ORDER) that are utilized. After the
PMD scan is done, we do binary recursion on the bitmap to find the optimal
mTHP sizes for the PMD range. The restriction on max_ptes_none is removed
during the scan, to make sure we account for the whole PMD range. When no
mTHP size is enabled, the legacy behavior of khugepaged is maintained.
max_ptes_none will be scaled by the attempted collapse order to determine
how full a mTHP must be to be eligible for the collapse to occur. If a
mTHP collapse is attempted, but contains swapped out, or shared pages, we
don't perform the collapse. It is now also possible to collapse to mTHPs
without requiring the PMD THP size to be enabled.
With the default max_ptes_none=511, the code should keep its most of its
original behavior. When enabling multiple adjacent (m)THP sizes we need to
set max_ptes_none<=255. With max_ptes_none > HPAGE_PMD_NR/2 you will
experience collapse "creep" and constantly promote mTHPs to the next
available size. This is due the fact that a collapse will introduce at
least 2x the number of pages, and on a future scan will satisfy the
promotion condition once again.
Patch 1: Refactor/rename hpage_collapse
Patch 2: Some refactoring to combine madvise_collapse and khugepaged
Patch 3-5: Generalize khugepaged functions for arbitrary orders
Patch 6-8: The mTHP patches
Patch 9-10: Allow khugepaged to operate without PMD enabled
Patch 11-12: Tracing/stats
Patch 13: Documentation
Would it be feasible to start with simply not supporting the
max_pte_none parameter in the first version, just like we won't support
max_pte_swapped/max_pte_shared in the first version?
That gives us more time to think about how to use/modify the old interface.
For example, I could envision a ratio-based interface, or as discussed
with Lorenzo a simple boolean. We could make the existing max_ptes*
interface backwards compatible then.
That also gives us the opportunity to think about the creep problem
separately.
I'm sure initial mTHP collapse will be valuable even without support for
that weird set of parameters.
Would there be implementation-wise a problem?
But let me think further about the creep problem ... :/
--
Cheers
David / dhildenb